THE ANNALS” 
of 
MATHEMATICAL 
STATISTICS 


(FOUNDED BY H. C. CARVER) 


THe OFFICIAL JOURNAL OF THE INSTITUTE 
OF MATHEMATICAL STATISTICS 


Contents 


On Transformations Used in the Analysis of Variance. J. H. ve 


On Fundamental Systems of Probabilities of a Finite Number of 
Events. Kar Lat CouneG 
On the Efficient Design of Statistical Investigations. ABRAHAM 


Some Significance Tests for Normal Bivariate Distributions. D.S8. 
VILLARS AND T. W. ANDERSON 

Symmetric Tests of the Hypothesis that the Mean of One Normal 
Population Exceeds that of Another. Hrerpert A.Srmon.... 149 

On Indices of Dispersion. Paut G. Horn 

On Serial Numbers. E. J. GuMBEL 

Fitting General Gram-Charlier Series. Paunt A. SAMUELSON 

A Method of Testing the Hypothesis that Two Samples are from 
the Same Population. Haroitp C. MaTHIsEN 

Notes: 


Note on the Independence of Certain Quadratic Forms. 


A Characterization of the Normal Distribution. 
News and Notices 
Special Courses in Statistical Quality Control 
Report on the New York Meeting of the Institute 


Vol. XIV, No. 2 — June, 1943 





THE ANNALS 
OF MATHEMATICAL STATISTICS 


EDITED BY 
: 8. 8. WILKS, Editor 
A, T. CRAIG — H. HOTELLING 
W. E. DEMING J. NEYMAN 
T. C. FRY W. A. SHEWHART 


WITH THE COOPERATION OF 


P. 8S. Dwrzr P. G. Hor. 
C, E1sENHART W. G. Mavow 
W. K. FEsvier: A. Wap 


The Annats oF MatuematicaL Sratistics is published quarterly by the 
Institute of Mathematical Statistics, Mt. Royal & Guilford Aves., Baltimore, 
Md. Subscriptions, renewals, orders for back numbers and other business com- 


munications should be sent to the ANNALS OF MATHEMATICAL Statistics, Mt. s 


Royal & Guilford Aves., Baltimore, Md., or to the Secretary of the Insti- 
tute of Mathematical Statistics, E. G. Olds, Carnegie Institute of Technology, 
Pittsburgh, Pa. Changes in mailing address which are to become effective for 
a given issue should be reported to the Secretary on or before the 15th of the 
month preceding the month of that issue. The months of issue are March, 
June, September and December. Because of war-time difficulties of publica- 
tion, issues may often be from two to four weeks late in appearing. ~ 
Subscribers are therefore requested to wait at least 30 days after month of issue | 
before making inquiries concerning non-delivery. . 


Manuscripts for publication in the- ANNALS oF MaTHEMATICAL STATISTICS 
should be sent to S. S. Wilks, Fine Hall, Princeton, New Jersey. Manuscripts 
should be typewritten double-spaced with wide margins, and the original copy 
should be submitted. Footnotes should be reduced to a minimum and whenever 
possible replaced by a bibliography at the end of the paper; formulae in foot- 7} 
notes should be avoided. Figures, charts, and diagrams should be drawn on J 
plain white paper or tracing cloth in black India ink twice the size they are to 
be printed. Authors are requested to keep in mind typographical difficulties ~~ 
of complicated mathematical formulae. 3 


Authors will ordinarily receive only galley proofs. Fifty reprints without a 
covers will be furnished free. Additional reprints and covers furnished at cost. 


The subscription price for the ANNALS is $5.00 per year. Single copies $1.50, 
Back numbers are available at $5.00 per volume, or $1.50 per single issue. 


CoMPOSED AND PRINTED AT THE 
WAVERLY PRESS, Inc. 
Bautmmore, Mp., U.S. A. 


Entered as second-class matter at the Post Office at Baltimore, Maryland, under the Act of March 3, 1879 ~ 











ON TRANSFORMATIONS USED IN THE ANALYSIS OF VARIANCE 
By J. H. Curtiss 


Cornell University 


1. Introduction. Transformations of variates to render their distributions 
more tractable in various ways have long been used in statistics [12, chapter 
XVI]. The present extensive use of the analysis of variance, particularly as 
applied to data derived from designs such as randomized blocks and Latin 
squares, has placed new emphasis on the usefulness of such transforinations. 
In the more usual significance tests associated with the analysis of variance, it 
is assumed a priori that the plot yields are statistically independent normally 
distributed variates which all have the same variance, but which have possibly 
different means. The hypotheses to be tested are then concerned with relations 
among these means. But in practice, it sometimes seems appropriate to specify 
for each variate a distribution in which the variance depends functionally upon 
the mean; moreover, in such cases, the specification is generally not normal. 
Tor example, when the data is in the form of a series of counts or percentages, a 
Poisson exponential or binomial specification may seem in order, and the vari- 
ance of either of these distributions is functionally related to the mean of the 
distribution. Before applying the usual normal theory to such data, it is 
clearly desirable to transform each variate so that normality and a stable vari- 
ance are achieved as nearly as possible. 

Various transformations have been devised to do this, and a number of articles 
explaining the nature and use of these transformations have recently been 
published.’ However, the available literature on the subject appears to be 
mainly descriptive and non-mathematical. The object of this paper is to pro- 
vide a general mathematical theory (sections 2 and 3) for certain types of trans- 
formations now in use. In the framework of this theory we shall discuss in 
particular the square root and inverse sine transformations (section 4), and also 
several logarithmic transformations (section 4 and section 5). 


2. General theory. As it arises in the analvsis of variance, the problem of 
stabilizing a variance functionally related to a mean may be stated as follows: 
Suppose X is a variate whose mean » = E(X) isa real variable with a range S of 
possible values, and whose standard deviation o = ox = o(u) is a function of u 
not identically constant. Required, to find a function 7’ = f(X) such that 
both f(X) and «7 = E{{T — E(T)]} are functionally independent of u for u 


2 
on S. | By “functionally independent,’ we mean that of = 0, and “ 


Ou Ou 
for u on s.) 


1 See references [1], [2], [3], [4], [5], [6], [13], [16]. 
107 











108 J. H. CURTISS 


The following line of argument is adopted in certain of the references men- 
tioned above ((1], [2], [3], [4]): From the relation dT = f’(X)dX, we deduce as 
an approximation by some sort of summation process that or = f’(u)o(u). 
Setting this expression equal to a constant, say c, we obtain f’(u) = c/o(u), 
so f(x) is an indefinite integral of c/o(x). The roughness of the approximation 
used here is only too apparent.” For example, if X is normally distributed, then 
the variance of T = X’ as given by the approximation is 40°u’, while actually 
it is 4407p”? + 2o". 

Indeed, it is easily seen that in important special cases the problem of sta- 
bilization as above stated could have no solution other than the trivial one in 
which T is identically constant on the set of points of increase of the d.f.* of X. 
For instance, if X has a Poisson exponential distribution, then the identity 
El{f(X) — Elf(X)}}"] = ¢, or E{[f(X)P} = ¢ + {E[f(X)]}’, becomes 


Sor s# ac+(Suogey, a>o 


Expanding both sides in powers of u, we need only equate the coefficients of the 
zero-th power of » on each side to find that [f(0))’ = ¢ + [f(0)]’, which implies 
that c = 0 and hence that f(0) = f(1) = f(2) = ---. A similar demonstration 
can be given for the case in which X has a binomial distribution with a fixed 
number of values of the variate. 

As to the problem of choosing 7 = f(X) so that its distribution is exactly 
normal, we can observe immediately that a single-valued function f(X) will 
never transform a variate X with a discrete distribution into a variate with a 
continuous one. On the other hand, any variate X with a continuous df. 
F(z) can be transformed into a normally distributed variate T by the transforma- 
tion T = f(X) defined by the equation 


ill 


? 


F(X) = [. ree dt. 


However, aside from the practical difficulty of solving this equation for 7, the 
resulting function 7 = f(X) will not generally be functionally independent of 
the mean of X. 

These considerations lead us to seek asymptotic solutions to the problemsof 
normalization and stabilization. Such solutions are considered in the next 
section. 


3. Asymptotic theorems. In the remainder of this paper, we shall suppose 
that the distribution of X depends on a parameter n which is to tend somehow to 


? Tippett [14] says: ‘“This derivation is not mathematically sound, and the result is only 
justified if on application it is found to be satisfactory.” 

3i.e., distribution function. For any given one-dimensional variate X we shall denote 
the probability or relative frequency assigned to a set R by P(R). The d.f. of the variate 
then is the point function F(z) = P(X sz). This function is sometimes called the cumula- 
tive frequency function of X. 


—EEE 





TE me 





TRANSFORMATIONS 109 
infinity. The mean yu = u, of X, with range S, , willin general depend upon n 
(although by this we do not mean to exclude the case in which yz, is constant for 
all values of n), and perhaps will depend also on some further independent 
parameters, which we shall denote collectively by 0, with range 2. We shall 
seek a variate T = f(X), in which f(X) is functionally independent of » and of 
the parameters @ for u on S, , 6 on Z, and such that the distribution of f(X ). -- 
f(un) tends as n — & to a normal distribution, while lim,..¢7 = c’, where c’ is 
an absolute constant. It is implied here that in case the edditionsl parameters 
6 are present, the function f(X) may depend non-trivially on n; but if n is the 
only parameter on which the distribution of X depends, then f(X) must be 
functionally independent of n. 

A solution to the problem just proposed is given in certain cases by Theorems 
3.1 and 3.2 below, which are suggested by the heuristic reasoning of the second 
paragraph of section 2. 

THEOREM 3.1. Let ¥,(x) be a non-negative function of x and n, defined almost 
everywhere and integrable‘ with respect to x over any finite interval of the x-axis for 
eachn>0. Let 


T = (X) = | vals) de, 


where a is an arbitrary constant. Let F,(y) be the df. of the variate Y = 
(X — pn)bn(un). Suppose further that a continuous d.f. F(y) exists such that 
limynsofn(y) = F(y) for all values of y. Then either one of the following two con- 
ditions is a sufficient condition for the d.f. H,(w) of the variate W = f(X) — f(un) 
to tend uniformly to F(w), -~ <w< o: 

(a) To each w for which 0 < F(w) < 1, there corresponds for all n sufficiently 
large at least one root x = x,, to the equation 


(3.1) | valu) du = w, 
Bn 
and this root x, has the property that 


(3.2) limn+o(2n — Ln)Wn(un) _ 
(b) For all n sufficiently large, n(un) > 0, and limn+oGn(w) = 1 uniformly in 
any closed finite subinterval of the open interval defined by 0 < F(w) < 1, where 
Ya(wlya(un)l + Ln) 
3.3 n(w) = ——— 
( ) ’ ( ) Wn(Un) 


To prove this theorem we shall first suppose that condition (a) is satisfied. 
Let w, and w be the end points of the open interval (possibly infinite) defined by 
0 < F(w) <1. If w lies in this interval, and if n is large enough for the root 

x 


Z, in (3.1) to exist, then from the monotonic character of / ¥vn(x) dx we can 





4 “Integrable” here means absolutely integrable in the sense of Lebesgue. 















110 J. H. CURTISS 


infer that 


Hw) = PLSK) — fon) Sw) = Pl [vale de sw 
(3.4) = P(X s Xn) = PY Ss (In — Bn) Wn (un) 
= F,l(2n i Ln) Wn(un)]. 


Since F(w) is continuous, lim,..F.(w) = F(w) uniformly on any finite or in- 
finite interval of values of w, as is well known.’ Therefore lim,.o/.(Wn) = 
F(w) if lim,..W., = w. Thus from @.2) and (3.4), we find that lim,...H,(w) = 
F(w) for w, < w < we. 

If w’ S w,, and wm, < w” < wm, then 0 S H,(w’) S H,(w”) = F(w”) + 
[H,,(w’’) — F(w’’)|. We can make the right hand member of this relation less 
than any given positive number e by first choosing w’’ so that F(w’’) < $e (it 
will be remembered that F(w) is a continuous d.f., and F(w,) = 0) and then 
choosing 7 so large that the quantity in square brackets is also less than $e in 
absolute value. Thus lim,..H,(w’) = 0. Similarly if w’ 2 we, we can show 
that lim,..H,(w’) = 1. Hence lim,..H,(w) = F(w) for all w, and it follows 
that the limit is uniform on any finite or infinite interval of values of w. 

We shall now show that condition (a) in the theorem is a consequence of con- 
dition (b). The result follows at once from the following simple lemma: 

Lemma. If y,(w) is a non-negative function integrable over any finite interval 
of values of w; and ¢f lim,..cyn(w) = 1 untformly in any finite closed subinterval of 
an interval wy < w < We, then for every value of w in this interval there exists for all 


y 
n sufficiently large a solution y = y, of the equation I Yn(z) dz = w, and the solu- 
0 


tion y, has the property that lim,..Yn = Ww. 
For it is clear that if w satisfies the inequality wu. < w < w., and if 7 > 0 
be chosen so that wm, < w — 7 < w+ 7 < w., then for all n sufficiently large, 


hoe cw | vetaves. 
0 0 


Thus for each n sufficiently large, there exists a root y, of the equation 
y 

[ yn(z) dz = w, and furthermore, this root satisfies the inequality w — n S 
0 


Yn S£w+ty. Since 7 is arbitrarily-small, the proof of the lemma is complete. 
To apply the lemma, we make the change of variables z = (u — pn)Wn(un) 
in the integral in (3.1), which reduces it to the form 


v 
(3.5) [ ante) de, y = (e — mada), 
and the conclusion that (a) is implied by (b) now follows at once. 


5 See [7], Theorem 11, pp. 29-30; also [8]. 





| 
| 
| 


TRANSFORMATIONS 111 


We add the remark that the uniformity of the limit of g,(z) in condition (b) 
may be replaced by the condition that for each closed finite sub-interval there 
exists a function g(w) which dominates q,(w) for all n sufficiently large. 

Our second theorem, which is stated in the terminology and notation of 
Theorem 3.1, is concerned with the limit of the variance of T = f(X). From 
the mere fact that the distribution of W tends to a limiting form, it by no means 
follows that the mean and variance of the distribution of W approach those of 
the limiting form, as may be shown by trivial examples. Thus additional 
hypotheses on y,(2) and on the behavior of the distribution of Y become nec- 
essary. 

THEOREM 3.2. Let T (or f(X)), Y, Fn(y) and F(y) be defined as in Theorem 
3.1. Let the mean and variance of the distribution defined by F(y) exist and have 
respective values 0 and c’. Then the following three conditions, taken together, are 
sufficient that 


(3.6) limnseolE(T) — S(un)] = 0, 
(3.7) lim,.otr = C: 


(i) E(Y’) exists for n > 0, and lim,..E(Y’) = c’. 

(ii) Condition (b) of Theorem 3.1 holds. 

(iii) f(Y[Wn(un))* + we) — f(un) = O| Y | uniformly innas| Y|— ~. 

As a preliminary step in the proof, we observe that (i) and the relations 
+00 


lim,../.(y) = Fly), & = [ y dF(y), imply that the improper integral 


[ y dF,,(y) converges uniformly in n for n > 0. As the integrand is positive, 


the following result is equivalent to the uniform convergence of the integral: 
For every « > 0, there exist numbers A, and Az, Ai < Az, such that for all n suffi- 


ciently large, 
Ay “\ | 
(f +/ \yv dF,(y) < «. 
0 A2 
To prove this, we write 


([- + f-) var.) = tee) - el 


+ If y’ dF(y) — - y arty) + E “ [i y’ ari) |. 


We first choose 4; and A» so that the last bracket here is less than $e in absolute 
value. By condition (i), the first bracket approaches zero as n tends to infinity, 
and the Helly-Bray theorem [10, p. 15] states that the second bracket also ap- 
proaches zero as n tends to infinity, so for all n sufficiently large, the sum of the 
first two brackets is in absolute value less than #e. 

It is important to notice that we can always choose A; and A: in the above 











112 J. H. CURTISS 


demonstration so that Ai > w,, Az < we, wheré w; and wy, are as usual the 
endpoints of the interval defined by 0 < F(w) < 1. 

To continue with the proof of the theorem, we remark that by a change of 
variables similar to the one used to derive (3.5), the function W = S(X) — fun) 
may be expressed as a function of Y in the following manner: 


W = I Vala) dz = I qa(w) dw = Qa(¥), 


where q,(w) is given by (3.3). In terms of W, (3.6) and (3.7) become, respec- 
tively, 


(3.8) lim E(W) = 0, 
(3.9) lim {E(W?) — [E(W)P} = ¢, 


and these are the equations which we now establish. 

Conditions (ii) and (iii) obviously imply that lim,..Q,(y) = y uniformly in 
any finite closed subinterval of the interval w, < y < w., and that a constant M 
exists such that | Q,(y)| < M|y|for all n. If E(Y’) exists, so will E(Y). 
Now 


+00 
E(W) = [ Q.(y) dFn(y) 
+o +00 
= [. e@)arcy) - [vara 


= ( r + [ ‘) [Q.(y) — y] dFn(y) + [ 7 [Q.(y) — y] dF .(y), 


where w; < A; < Ao < we. Therefore 


jes (f+ [Jar + wivi area) + fas) - vl ara), 


+00 
From the uniform convergence of [ y dF ,,(y), proved above, we can conclude 


that the pair of improper integrals in this inequality can be made less than an 
arbitrary }¢ > 0 by proper choice of A; and A,. The third integral approaches 
zero, by the general Helly-Bray Theorem [10, p. 16], and so becomes less than 
4¢ for all n sufficiently large. Thus we have established (3.8). To show that 
(3.9) is true, we have merely to prove that lim,..4(W*) = c’. Since E(Y*) = 


+00 
[ y dF ,,(y), we may write 


E(w) — & = [ lQaw)l — 9°} aPaty) + [E(Y) - 


TRANSFORMATIONS 113 


The integral may be shown to approach zero by the argument used in the case of 
E(W), and the required result then follows from condition (i) of the theorem. 
The proof is now complete. 

The sufficient conditions in Theorem 3.2 can be modified in various more or 


less obvious ways. The existence of the limiting d.f. F(y) was essentially used 
+oo 


in the proof only to secure the uniform convergence of ydF,(y). Condition 


(ii) can again be modified along the lines suggested at the end of the proof of 
Theorem 3.1. Condition (iii) was used only to secure the uniform convergence 


of the integral i [Q.(y)PdF,.(y). 


For later reference, we shall supplement Theorems 3.1 and 3.2 with the follow- 
ing simple result, which is practically self-evident. 

THEOREM 3.3. Let the distribution of a variate Y depend upon a parameter n, 
let F,.(y) be the d.f. of Y, and let F(y) be a continuous d.f. with the property that 
lim, n(y) = Fly). Let a, bea function of n such that lim,..@, = a ~ 0. 
Then the d.f. of the variate Z = a,Y tends asn — @ to the d.f. F(z/a) if a > 0, 
and to the d.f.1 — F(z/a) if a < 0. If the variance of Y exists and tends to c’ 
asn — &, then the variance of a,Y tends to a’c’ asn — ~, 

If F(y) is the d.f. of a reduced normal distribution, i.e., 


1 . —$t2 
Fy) == [ &* at, 
V2 +x 
then F(z/a).is also the d.f. of a normal distribution with mean zero and variance 
a’. More generally, any affine transformation of a normal variate yields 
another normal variate. 


4. Applications. The theorems of the preceding section have the effect of 
referring: the properties of the distribution of the transformation T = f(X) of 
Theorem 3.1 back to those of the distribution of a related variate Y. In the 
applications given in the present section, we shall let ¥,(u,) be proportional to 
the reciprocal of the standard deviation of X. The theorems of section 3 state 
in this case that if the reduced, or standardized, distribution of X approaches a 
limiting form, then under certain circumstances, the distribution of f(X) — 
f(un) will approach a similar limiting form, and 7 will approach a quantity 
independent at least of n. In the applications considered here, the reduced dis- 
tribution of X will always approach the reduced normal distribution. 


(I) The square root transformation for a variate with a Poisson exponential 
distribution. Let X have a Poisson exponential distribution with parameter in. 
If a is an arbitrary constant, and if 


(4.1) T = f(X) = “ + ” : . - 











114 J. H. CURTISS 


then the distribution of T — ~/n + a tends as n — & toa normal distribution 
which has mean zero and variance 3, and lim,..¢7 =}. For ua = 2, ox = Vn, 
and it is well known’ that the distribution of the reduced variate (X — n)/+/n 
tends to the reduced normal distribution asn — «©. By Theorem 3.3, the dis- 
tribution of the variate 


i eM oe, n_ X—n 
W/nta 2Vnta Vn’ 
will tend to normality as n — ©, and the variance of Y will tend to the value j, 
which is also the variance of the limiting distribution. Setting 





| 1 
a(x) = 2V 2 +a. 


( 0 , 2 —e, 


rI>-—-a 


x 
we obtain from T = f(X) = [ ¥n(x)dz the formula given in (4.1). To prove 


the statement in italics, we must show that conditions (ii) and (iii) of Theorem 
3.2 are satisfied. We have, assuming n > —a, 


2w 4 a 
qn(w) = 0+) ——— 
0 ’ ws —3/n +a, 








so clearly (ii) is satisfied. Also, 
We = SV alua) + tn) — fun) 
V2¥/ntatnt+a-Vnta, Y>-}V/nta 
" es Ys -hWate 


from which it follows at once that | W | < 2| Y | forall Y, and so (iii) is satisfied. 

The degree of approximation involved in the equation lim,..07 = } has been 
investigated numerically by Bartlett [1] for values of n from .5 to 15.0 in the 
cases a = 0 and a = 3. He found that the variance of »/X + (3) is consider- 
ably closer to the limit (3) for 1 S n < 10 than is the variance of »/y. At 
in = 15, the variance of ~/X is .256, and that of »/X + (4) is .248. 

The question of the degree of convergence to normality and of the possibility 
of selecting an optimum value of a remain open. By expanding the function 
~/X + a in @ Taylor series about X = n with remainder in the form due to 
Schlémilch, it is possible to derive as accurate an estimate of | 07 — (3)| as may 








* See (e.g.) [9]. 


TRANSFORMATIONS 115 


be desired. A rough result easily obtainable by this method is that | «7 — (4)| < 
3/(4n),n > 0. 

(II) The square root transformation for a variate with a I distribution. 
Let X have a distribution whose density function is of the following type: 


(4.2) (2) : rn 
‘ ¢g(z) = 
Kz" "e™, 220,h>0. 
If ais an arbitrary constant, and if 
VX + Qa, X 2 —@ 
(4.3) T = f(X) = 
; X < —a, 


then the distribution of T — ~/(n/2h) + a tends asn — & to a normal distribu- 
tion which has mean zero and variance 1/4h, and lim,+.or = 1/(4h). For un = 


n/(2h), ox = V/n/(hv/2) = V/u,/h. The distribution of the reduced variate 
tends to normality as n > ©,’ so that of the variate 


T os = analiiee =34 n ». Sagal 
— Vpn ta 2Y mht Wha Vun/h 


tends to normality also with limiting variance 1/(4h). Setting 


— z>—a 
Walt) = \2Va+a" 
{ 0 : ZS —a, 


x 
we obtain 7 in (4.3) from the relation T = [ ¥n(x) dx. The work of verifying 


that the conditions of Theorem 3.2 are satisfied is the same as in the case of the 
Poisson exponential distribution treated above, and will not be repeated. 

For example, if s’ denotes the variance of a random sample of n + 1 observa- 
tions drawn from a normal parent distribution with variance o’, then it is well 
known that (n + 1)s’ is distributed according to (4.2) with h = 1/(20°). We 
thus can deduce the further facts, also well known, that the distribution of 
V/n +18 — ov/n tends to normality, and that the variance of s\/n + 1 
approaches the limiting value 30°. If n is an integer and h = }, the distribution 
defined by (4.2) is called a x’ distribution with n degrees of freedom, and the 
variate is often denoted by x’. Our conclusion in this case is that the distribu- 
tion of ~/2x? — +/2n tends to a normal one with zero mean and unit variance. 
From this result and the fact that 1/2 n— 1 — ~/2n = O(n‘), it follows im- 
mediately that +~/2y? — +/2n — 1 has the same limiting distribution as 
V/ 2x2 — ~/2n. This result,* due to Fisher, is familiar to all users of his table of 
the probability levels of x’. 








7 See (e.g.) [9]. 
* For a discussion of the degree of convergence involved here, see [9]. 








116 J. H. CURTISS 
(III) The inverse sine transformation for a binomial variate. Let X have 


a binomial relative frequency distribution with parameter p and the n values 0, 1/n, 
2/n,---,n/n. If ats an arbitrary constant, and if 


( nsin 4/ ;- «avait. ? 
a/n sin a+; — 7 aXsl : 


0, X<-"2, X¥>1-, 
{ n n 


(4.4) T = f(X) = 





where T is measured in radians, then the distribution of T — ~/nsin™' Vp + (a/n) 
tends as n — & toa normal distribution which has mean zero and variance 3, and 
lim,..07 = 3. For here, un = p, and ox = pq/n, where gq = 1 — p; and the 
familiar DeMoivre-Laplace theorem states that the distribution of the reduced 
variate ~/n(X — p)/~/pq will tend to normality as n + o. Hence by 


Theorem 3.3 the distribution of ; 
Vn(X — p) 


will tend to normality with a limiting variance of 4, which is also the variance of’ 
the limiting distribution. Setting 


ne —-<#<1-© 

a a 

Va(t) = 24/ (7+2)(1 -2-) 
0 eS --,221-, 
n nN 


we obtain (4.4) from the integral 
x 
T= | |, Vala) de. 


In proving the conditions (ii) and (iii) of Theorem 3.2 are satisfied, we shall 
assume for simplicity that a = 0. We find that 


q-p 4w\* 1, /np 1, /nq 
1 /np 1 /nq 
s -—-4/ — = =-A/ — 
0 , we-5 q” w2is/4, 


so obviously (ii) is satisfied. From the Law of the Mean in the form due to 
Schlémilch, we have 


W=vVn sin"4/ + 24/%2 Y — Vnsin’ Vp 


1-0 


= 2Y | -—————————— ], 
(4.6) ( + 204/' 7 r) (1 ~ 20 4/P =| 
np nq 
0<6<1, yf <¥ <hy/™. 
2 q 2°Y¥ Pp 


Qn(w) = 








TRANSFORMATIONS 117 


The denominator of the coefficient of 2Y here is a quadratic function of Y with a 
negative coefficient of Y*, and so must assume its least value in the Y range 
indicated in (4.6) at one end or the other of the range. From this it is readily 
‘ seen that the coefficient of 2Y is actually always less than unity. For values of 
Y outside the range, the second member of (4.6) indicates that W = O(4/n) = 
O(Y). Hence (iii) is satisfied, and the proof of the statement in italics is com- 
plete for the case a = 0. The more general case presents no important new 
difficulties. 

In practice, it is often convenient to express X as a percentage. This merely 
has the effect of multiplying Y in (4.5) by 100. We find in this case that ~/n 
sin /X + 100a/n — Vn sin '+/100p + 100a/n has a distribution ap- 
proaching normality, and o7 — 50 instead of 3. 

Bartlett [1] gives numerical results in the cases n = 10, a = 0 and n = 10, 
a = 3, which indicate that perhaps the choice a = 3 is more suitable if the 
estimated p is near 0 or 1, but the choice a = 0 is preferable if the estimated p 
lies between .3 and .7. However, there seems to be no good reason to believe 
that these conclusions should be valid for other values of n. The question of an 
optimum a, and of the degree of convergence to normality remain open. We 
note in passing that the latter problem could doubtless be profitably studied by 
combining the methods of proof of Theorem 3.1 with the results of Uspensky 
[15, pp. 129-130] on the degree of approximation of the reduced binomial d.f. 
to the normal df. 

IV. Other transformations of a binomial variate. Let X have a binomial 
relative frequency distribution with the parameter p and the n values 0, 1/n, 2/n, - 
n/n. 


(a) If 


"4 


Vn sinh” SX = Vnlog (VX + V71+X), X20 
0 — 


then the distribution of T — ~/n sinh ~/p tends as n > ~ to a normal distribu- 
tion which has mean zero and variance q/(4 + 4p), and lim,.oor = q/(4 + 4p). 
(b) If 


T = f(X) -| 


a/n log X, X>0, 
: X s0, 


r= 100) =| 


then the distribution of T — ~/n log p tends as n — ~ to a normal distribution 
which has mean zero and variance q/p, and lim,..07 = q/P. 
(c) If 
1 - xX 
= Vn log ——; 0< X¥ <1 
7 = s(x) = (2V" T= 
0 , Xs0, X21, 


® All logarithms in this paper are to the base e. 














118 J. H. CURTISS 


tends asn — & toa normal distribu- 





then the distribution of T — 5V n log z ¥ i 
tion which has mean zero and variance 1/(4pq), and lim,..0¢7 = 1/(4pq). 


Since the limiting variance of each of these transformations involves the’ 


parameter p, they are not to be regarded as solutions of the problem of asymp- 
totic variance stabilization proposed at the beginning of section 3, although it is 
perhaps of some interest that their distributions become asymptotically normal. 

In case (a), f’(x) = Vn/(2VW22?+2),x>0. Setting y,.(x) = f’(x), x > 0, 
and y,(xz) = 0, x < 0, we obtain 





(4.7) ¥ = (X — pWalp) = VOX —P)._vo_, 

Vpq 2V1+>p 
and this variate obviously has the limiting distribution ascribed to T — 
vn sinh” +/p in the statement in italics. The truth of that statement now 
follows by an argument similar to that used in the case of the inverse sine transfor- 
mation. 

If p is allowed to vary with n in such a way that lim,..np = ©, it is known 
that the reduced distribution of X will still tend to normality.” If we suppose 
that lim,..p = 0, but lim,..np = ©, we find from Theorem 3.3 that the 
limiting distribution of Y in (4.7) will be normal with mean zero and variance 
1 and that oy — }. It is easily verified that the conditions (ii) and (iii) of 
Theorem 3.2 are still satisfied, so we find that the limiting distribution of [4/n 
sinh ~/X — +/n sinh +/p] is normal, with mean zero and variance }, and 
a; — 1. However, since n is now the only independent parameter, we cannot 
here regard the transformation T =+/n sinh ' ~/X asa solution of the problem 
of variance stabilization, because the variate T depends explicitly upon n. 

If in case (b) we proceed as in case (a), we obtain as the analogue of (4.7) 
the formula 





i - 
, Vn(X — p) , /q 
Pu & — ple) = ——.-—'4/* 
Vv pq p’ 
and this variate has the limiting distribution ascribed to T — +/n log X in the 
statement in italics. It now turns out that although condition (ii) of Theorem 
3.2 is satisfied, condition (iii) is not satisfied. We are then faced with the 
problem of proving directly that the improper integral 


c. [Vn log (p + py/Vn) — Vn log p]’ dF a(y) 


converges uniformly.” The trouble occurs only at the lower limit of integra- 
tion, and may be resolved by first integrating by parts, then dividing the range 


10 See (e.g.) [9]. 
11 See the remarks following the proof of Theorem 3.2. 


| 


ee 


re 


TRANSFORMATIONS 119 


(—+/n, A1) into two ranges (—+/n, —log n) and (—log n, Ai), and then 
applying Uspensky’s results [15, pp. 129-130], on the degree of approximation 
involved in the DeMoivre-Laplace theorem. 

Case (c) may be handled in a similar manner. 


5. The logarithmi¢ transformation. We shall suppose throughout this section 
that X is a variate whose mean yu, and standard deviation o in the relation 
o = ka(u, + a), where a is an arbitrary constant, k, > 0, and lim,..k, exists 
and is finite. If k, is constant for all n, say k, = k > O, and if we use the 
heuristic argument of the second paragraph of section 2 to attempt to find a 
transformation which will stabilize the variance of X at k’, we arrive at the 
function 7 = log (X + a), X > —a. It is the purpose of this section to study 
the asymptotic properties of this transformation. 

The theory of such a transformation differs in certain important respects 
from that of the transformations considered in sections 3 and 4. For onething, 
our starting point in the study of each transformation considered in section 4 was 
the fact that although P(X < 0) = 0, nevertheless the reduced distribution of 
X tended to normality.asn— ©. But in the present case, if X is a variate such 
that P(X < —a) = 0, then the corresponding reduced variate Y = (X — un)/ 
[kn(un + a)] hasa df. F,(y) such that F,(—1/k,) = 0. Thus if lim,..k, = 
k > 0, the limiting distribution of Y, if it exists, must.have a d.f. F(y) such that 
F(—1/k — 0) = 0. Therefore the limiting distribution of Y can never be nor- 
mal if k > 0. 

Moreover (in contrast to the situation in Theorem 3.1) if the reduced variate 
Y does have a limiting distribution, the variate 


- 1 


| <eea" X>-a 


1 1 
(5.1) W = 7 log (X + a) — 7 log (ua + a) = 
may have a limiting distribution which is not the same as that of Y. More 
specifically, we have the following result: 
THEOREM 5.1. Let P(X S —a) = 0, let lim,.kn = k 2 0, let F,(y) be the 
d.f. of the reduced variate 


X — in 


- Kn(un + a) . 


and let H,(w) be the d.f. of the variate W given by (5.1). If a continuous d.f. F(y) 
exists such that lim,..F.(y) = F(y) for all y, then 


: F| —— |, k>0O 
lim H,(w) = k 
ae F(w) ' k =0. 











120 J. H. CURTISS 


The proof is simpler than the statement; essentially we have only to notice that 


1 etn” 1 
== —_—— < 
H,(w) P| . <7vs kn | 


knw 
- Fl! i ' —x<c wc, 


and apply the reasoning used above in connection with (3.4). 

From the study of the distribution of T, we now turn for a moment to the 
question of the limit if o7. Here the situation is more consistent with the 
results of section 3. 

THEOREM 5.2. Under the hypotheses of Theorem 5.1 and under the additional con- 

0 


0 
ditions that the improper integral [ w dH,(w) (or { . k’ [log (1 + kay)] aFy)) 
) 1/ky 











+00 
converges uniformly in n and that [ y’ dF(y) = 1 = E(Y’), the following relations 
hold: 


= 
(5.2) lim E(W) = Lk log (1 + ky) dF(y), k > 0, 
n-?co { 0. ; k ie 0, 
(5.4) lim E(W’) = ‘. k log (1 + ky)] dF(y), k>0oO 
no . | si 


The variance o7 of the variate T = log (X + a) is related to these mean values 
by the equation o7 = ki,{E(W’) — [E(W)]’}. Thus if F(y) is independent of 
any unknown parameters 8, and if k is positive and is presumed to have the same 
value for all variates in any given problem, then the transformation T = 
log(X + a) is seen to yield an asymptotic stabilization of the variance under 
the conditions of Theorem 5.2. If k = 0, we find from either Theorem 5.2 or 
the proof of Theorem 5.2 that T = log(X + a) converges stochastically to 
log(un + a). 

The proof of Theorem 5.2 is similar to that of Theorem 3.2 and will be omitted. 

Theorem 5.1 raises the following question: Just what limiting distribution 
must Y have if k > 0, in order that the distribution of W tend to normality? 
To answer this, we shall note the following simple non-asymptotic result: 

THEOREM 5.3. <A necessary and sufficient condition that X have a continuous 
distribution with density function 


ee a = 
V 2x log (2 + 1) t+ 
(5.4) (2) = | (log = 
xX exp ee ute) , &£>-a 
2 log (k? + 1) 


»> t8S-—a 


—$$— $$ $e 


| 
| 








TRANSFORMATIONS 121 


for which ox = k(u + a), is that the variate T = log(X + a) have a normal dis- 
tribution with mean log(u + a)—log ~/k? + 1 and variance log(k’ + 1). 

The proof may be given by a routine change of variables.” It is to be noticed 
that the heuristic argument of the second paragraph of section 2 would lead to 
the incorrect result that the variance of T was k’ instead of log(k’ + 1). In 
case k = 1, the mean and variance of T are respectively log(u + a) — .347 and 
693. If the transformation T = logi(X + a) is used, the new mean is 
logio(u + a) — logi+~/k? + 1 and the new variance is .189 log(k’ + 1, which 
for values of k near zero has the approximate value .189k’.” 

If X is distributed according to (5.4), the density function F’(y) of the corre- 
sponding reduced variate Y = (X — u)/[k(u + a)]jis 


k 1 
V 2x log (k? + 1) 1 + ky 
(5.5) Fy) = x exp | - ee . pais 1}} | y> -7 
1 
0 y s “";° 


The d.f. of the variate W = k'[log(X + a) — log(u + a)] is F[(e” — 1)/kl, 
and, of course, the distribution of W is normal with mean —k™ log\/k? + i 
and variance k” log(k’ + 1). These are the respective values of the integrals 
in (5.2) and (5.3). 

If now the distribution of X depends on a parameter n in such a way that as 
n — ©, the distribution of the corresponding reduced variate Y = (X — un)/ 
[kn(un + a)] tends to the distribution given by (5.5), it follows from the above 
remarks and from Theorem 5.1 that the variate W given by (5.1) has a normal 
limiting distribution. Furthermore, under the uniform convergence condition 
of Theorem 5.2, it follows that o7 tends to the value log(k’ + 1), where T = 
log(X + a). 

These facts provide a sound mathematical basis for the use of the logarithmic 
transformation, which has had a long history of empirical success in problem: of 
normalization [12, chapter XVI] and stabilization ((6], [16]). When it appears 
from a reasonably large number of observations on a variate (which is essentially 
bounded from below) that the standard deviation of the variate is proportional 
to the mean, then a possible specification for the variate is a distribution of the 
form (5.4); or, at least for large values of u, it may be assumed that the distribu- 
tion of the reduced variate is given by (5.5). Then the variate T = log(X + a), 
where —a is any number less than the lower bound of X, will be exactly or ap- 
proximately normally distributed with a variance independent of the value of yu. 

Since (5.4) is only one of an infinity of various different types of distribution 





12 Finney [11] has considered the problem of efficiently estimating the variance of the X 
of Theorem 5.3 in the case a = 0. (The actual density function (5.4) appears nowhere in 
his paper.) 

18 Given (without explanation) by Cochran (6, p. 165]. 








122 J. H. CURTISS 


in which the mean and standard deviation are proportional, the user of a loga- 
rithmic transformation in the analysis of variance should always apply tests for 
departure from normality to the observed distribution of T values. From the 
point of view of specification, the situation here would seem to be less reassuring 
than in the cases considered in section 4. While it is true that the Poisson 
exponential distribution is only one of many types of distribution in which the 
variance and mean are equal, nevertheless the specification of a Poisson distribu- 
tion can generally be preceded by a fairly strong chain of a priori inductive 
reasoning. This would not seem to be the case in the specification of (5.4). 
Theorems 5.1 and 5.2 furnish some grounds for a suspicion that the logarithmic 
transformation may possibly be more successful in stabilizing the variance than 
in normalizing the data. The burden of proof, however, lies with the experi- 
menter.’ 


REFERENCES 


[1] M.S. Barriert, ‘‘The square root transformation in the analysis of variance,”’ Jour. 
Roy. Stat. Soc. Suppl., Vol. 3 (1936), pp. 68-78. 
[2] Georrrey BEALL, ‘‘The transformation of data from entomological field experiments 
so that the analysis of variance becomes applicable,’’ Biometrika, Vol. 32 (1942), 
pp. 243-262. 
[3] C. I. Buss, “The transformation of percentages for use in the analysis of variance,” 
Ohio Jour. Science, Vol. 38 (1938), pp. 9-12. 
[4] A. Cuark and W. H. Leonarp, “The analysis of variance with special reference to 
data expressed as percentages,’ Jour. Amer. Soc. Agron., Vol. 31 (1939), pp. 
55-56. 
[5] W. G. Cocuran, ‘‘The analysis of variance when experimental errors follow the Poisson 
or binomial laws,’’ Annals of Math. Stat., Vol. 9 (1940), pp. 335-347. 
[6] W. G. Cocuran, ‘Some difficulties in the statistical analysis of replicated experi- 
ments,’ Empire Jour. Expt. Agric., Vol. 6 (1938), pp. 157-175. 
[7] H. Cramtr, Random Variables and Probability Distributions, Cambridge, 1937. 
[8] J. H. Curtiss, “‘A note on the theory of moment generating functions,” Annals of 
Math. Stat., Vol. 13 (1942), pp. 430-433. 
[9] J. H. Curtiss, “Convergent sequences of probability distributions,’ Am. Math. 
Monthly, Vol. 50 (1943), pp. 94-105. 
[10] G. C. Evans, The Logarithmic Potential, New York, 1927. 
[11] D. J. Finney, “On the distribution of a variate whose logarithm is normally distrib- 
uted,” Jour. Roy. Stat. Soc. Suppl., Vol. 7 (1941), pp. 155-161. 
[12] ARNE FisHer, The Mathematical Theory of Probabilities, Second edition, New York, 
1930. ; 
{13] R. A. FisHer anp F. Yates, Statistical Tables, London, 1938. 
[14] L. H. C. Tippett, “Statistical methods in textile research. Part 2, Uses of the bi- 
nomial and Poisson distributions,’’ Shirley Inst. Mem., Vol. 13 (1934), pp. 35-72. 
{15] J. V. Uspensxy, Introduction to Mathematical Probability, New York, 1937. 
(16] C. B. Witi1aMs, ‘‘The use of logarithms in the interpretation of certain entymological 
problems,” Annals of Appl. Biol., Vol. 24 (1937), pp. 404-414. 





14 A transformation closely related to the logarithmic one is T = k~ sinh~'(kX)!, where 
k is an estimate of the Charlier coefficient of disturbancy of a Poisson distribution. This 
transformation has recently been studied from an empirical point of view by Beall [2]; 
it was suggested by the heuristic argument of section 2 applied to the case in which o*? = 
u + kyu*. Beall presents evidence that for the particular data which he considered, the 
transformation seemed to stabilize the variance and normalize. A mathematical theory 
would follow the lines laid down above in the case of T = log (X + a). 





ta FO 


ee ee ee ae ee. | 








ON FUNDAMENTAL SYSTEMS OF PROBABILITIES OF A FINITE 
NUMBER OF EVENTS 


By Kar Lar CuunG 
Tsing Hua University, Kunming, China 


We consider a probability function P(E) defined over the Borel set of events 
generated by the n arbitrary events E,,--- , E,, which will be denoted by 
L(1, --- , 7). 

We use the same notations as in the author’s former paper’, with the following 
abbreviations. We denote a combination (a; --- a.) simply by (a), and use 
the corresponding Latin letter a for its number of members. Similarly we write 
(8) for (6; --- 8), but (v) for (1, ---,n). We say that (8) belongs to (a) and 
write (8) ¢ (a) when and only when the set (6; --- 8) is a subset of (a; --- aa). 
Then and then only we write (a) — (8) for the subset of elements of (a) that do 
not belong to (8); thus we may write it as (y) withe = a — b. When and only 
when (a) and (8) have no common elements, we write (a) + (8) for the set of 
elements that belong either to (a) or to (8); thus we may write it as (vy), with 
c=a+b<sn. We note the case for empty sets: (0) + (0) = (0). Now we 
can write Pia) for Pja,---ae) » Pia) fOr Pa;---ag, Po((a)) for pear --+ aa), ete. 
Further we denote by pp)((a)) (1 S b S a S n) the probability of the occurrence 
of exactly b events out of E.,,---, H.,, and write 


P(r») = LX pa((a)), PL™((r)) = DL ptm((a)); 
(a) e (rv) (a) € (») 


since a is fixed by the left-hand sides, the summations on the right-hand sides 
are to be extended to all the e -combinations of (v). 
A sum written } > is to be extended to all combinations (8), b = 0,1, --- ,a 


(8) e(a) 
belonging to (a), when b is not previously fixed; it is to be extended to all the 


: -combinations belonging to (a), when b is previously fixed. 


DEFINITION 1. A system of quantities is said to form a fundamental system of 
probabilities for a set of events if and only if the probability of every event in the 
set can be expressed in terms of these quantities. 

DEFINITION 2. An event in &(1, --- , n) is said to be symmetrical if and only 
if it is identical with every event obtained by interchanging any pair of suffixes 
(i,j) (i,j = 1, -+- , n) in the definition of it. The subset of symmetrical events 
in L(1, --- , n) will be denoted by S(l, --- , 7). 

From the normal form’ of every event in £(1, --- , n) and the principle of 


1 “On the probability of the occurrence of at least m events among n arbitrary events,” 
Annals of Math. Stat., Vol. 12, 1941. 
2 See Hilbert-Ackermann, Grundziige der tiveoretischen Logik, Chap. 1. 


123 











124 KAI LAI CHUNG 


total probabilities, we can easily see the truth of the following theorems, which 
may of course be made more precise. 

THEOREM. The system of pia) , (a) € (v), 2” in number, forms a fundamental 
system for £(1, --+ , n). 

THEOREM. The system of pja((v)), 0 S a S n, n + 1 in number, forms a 
fundamental system for 5(1, --- , n). 

Next, a theorem of Broderick’, in a less precise form, may be stated: 


The system of Pia) (Poy) = 1), (a) € (v), 2” in number, forms a fundamental 
system for &. 


We may add in an easy way the following 

THEOREM. The system of Sa((v)) So((v)) = 1,0 S a S n,n + 1 in number, 
forms a fundamental system for 5. 

In the present paper we shall prove, inter alia, the following four theorems 
of the above type, stated in more precise forms. 

THEOREM 1. For any E in &, we have 


P(E) =m+ >> cam((a)), 
(a)e(y) 
ax40 


where cy = Oor 1 and the c,’s are integers; and they are unique’. 
THEOREM 2. For any E ih 5, we have 


P(E) = « + > cP, 
a=] 


where co = 0 or 1 and the c,’s are integers; and they are unique. 
THEOREM 3. For any E in &, we have 


P(E) = d+ ttn da Py ((a)), 
ays0 


where dy = 0 or 1 and the d,’s are rational numbers and they are unique. 
THEOREM 4. For any E in 9, we have 


P(E) = do + 2, dapl”, 
a=1 


where dy = 0 or 1 and the d,’s are rational numbers; and they are unique. 

Less precisely, we may say that the system of p,((a)) or pj ((@)) forms a 
fundamental system for ©; the system of P{?((v)) or P!"!((a)) forms a funda- 
mental system for 5. 

In fact however, we shall give much more than the mere proofs of 


3 Fréchet, ‘‘Compléments 4 un théoréme de T. S. Broderick concernant les événements 
dependants,”’ Proc. Edinburgh Math. Soc., Ser. 2, Vol. 6 (1939). 

‘ “Unique’’ in the sense that it is impossible to replace therein the coefficients c by other 
numbers which are independent of the Borel set of events and the probability function. 


S$ 








SYSTEMS OF PROBABILITIES 125 


these theorems. We shall establish the following explicit formulas for the 
general parameter m. 


(i) Pio) = 1 — pi((v)), 
(1.1) (ii) Pio) = 2, (—1)"'pi((v) — (a) + (8)),° 1<a<n. 
La De 


a 1 n min(¢,n—a) 


m ¢ in 2 i 
Pua) = (—1) —— i Be ait us (—1) “. - ius n) 


1 
” Pm((v) — (6) + (6)), n>a>m>2 
(8) e(v)—(a) P 
(y)—(6) e(a) 
(2.1) pa((o)) = > (-y"* a 2) Pe) 1<a<n. 
—" 
(2) Pio ((v)) = > (—1)""L(n, a, b, m)P,” ((v)), n>a>m> 2, 
where 
0 », b<n-a+m-l, 
( yw ( a )" b o 1 
- ‘ =n-a+m-—l, 
L(n, a,b, m) = ditties 


(—1)"*(m — 1)!(6—m)! 
-(a—m)!{ab—n(m—-1} b>n-a+m-—l1. 
al(n —a)!(a+b—n—m-+1)!’ 
@) Gi) prot) = 1-20 ("— 1) Pe 


n c=l c — 1 
nm min(c,n—a) -1 
m™m c—d = 1 
a)) = —]l = —1 
ne ot. 2s 
hy 


” i cy Pi) — @) +),  nzazm21. 
Os.) 


4) rao = ds (orn ™(2) pity), ndadm21. 


m+n = & m 


A simpler derivation of (1) than that given in an earlier paper’ follows. Let 
us write Poincaré’s formula as follows: 


pa((@) = (-9" (27 1) su@. 


5 Obviously we mean ((v) — (a)) + (8) and ((v) — (6)) + (8) respectively; similarly in the 
sequel. : 








126 KAI LAI CHUNG 


Then for a fixed b 2 m, summing over all (6) ¢ (v), we get 


pay) = D(-" (2 — (5 2) so. 


(8B) e(») 
Hence 


i (-1)"™ oes Pm((8)) = E : . ? S.-((»)) . (—1)"° § 7 ‘) | 
(67 1) song if c=n | 


ae \m 0 if c<n 


= (% = 1) son = (27 1) oo». 


A change of notation gives, for a + b = m, 


(1) 


— a+b 
(° . - 1 ') Poor) = 2 (-1)"" 2 pa((y)). 
cm (7) (a) + (8) 
Hence 
a+b-1 

( a= . ees 

a+b ai min (¢,n—a) - aaa! d 

= 2(-1"" 2 P bd ) do. Pm((y) — (8) + (8)). 
c=m =max (0,c—a) (8) €(¥) —(a) 


(7)—(8) e(a) 


Substituting in the well-known formula, for a 2 1 


n—a 
b 
Pia) = Zz (—1) ps P((a)+(8)) » 
b=0 (8) e(»)—(a) 


IV 


we get forn 2 a m 


min (¢,n—a) 


Pie) = Zz (—1)*™ 
c=—m d=max (0,c—a) 


een PD -— 0 + OLE or" 7 S7TAOFP TN 


(7)—(8) e(a) 


(1) 


Thus the problem reduces to the summation of the following series: 


Bossa ae). 


Case 1: m = 1. In this case the series reduces to 


Ss a(n —a—a\_ f(-))”"” if d=n—a, 
2, ( »*( b—d )={ 0 if d<n-—a. 











SYSTEMS OF PROBABILITIES 127 


Hence for a 2 1, 


Pron= > (-—1)" DY pil) —(a) +(y) — ((@) — (a)))(-1)""* 


(1,2—a) (y)—(C(r)—(a)) € (a) 


Writing (y) — ((¥) — (@)) = (6), we obtain 


Pro =, (DD. pl) - (@) + @). 


b==-max (1—n-+a,0) 


This is equivalent to (1.1), (ii), while (i) is trivial. 
Case 2: m 2 2. We have, force 2 1, 


BolT) -aeCs22') - 


which is easily proved by induction on a. 
Hence for m 2 2, 


beat 27 
-"> (- re ae Ss 
im 


m—l1 


= (-—1)° 4 (-y( “se - -— +d-—-1+ “y 


m-—l1 
am—-1f n+2 \" 
“< yeat(,7t?.) 


Substituting in (1) we get formula (1). 
To derive formula (2.1) for a fixed a, 1 S a S n, we sum (1.1, ii), which gives 





pel) = DL pror= 2D (-)" DY LD ml(v) — (a) + (8). 
(a) € (¥) ben0 


(a) (vr) (8) € (a) 
n—a+bys0 
Letting (v) — (a) + (8) = (7), we get 


pa) = ore 2) mln, 


(in oo (y) € (») 


which is formula (2.1). 
The following form of Poincaré’s formula is of assistance in deriving (2): 


pai()) = & (-0(Z) 8a. 











128 


KAI LAI CHUNG 


i 
Substituting from (1), we get 


Pil) = (-* (‘) (< _ +) y(-0"" (” 7 2 PS” ((v)) 


ie > (—1)™ PS”((v)) a — 3) (" = ) (m ie i) } 


Thus the problem reduces to the summation of the following series: 


Lieven) = Fo) E = aa) 


First, we have, for z 2 0, y 2 w, 


2 


z[{z 
eal (~ij (:) (2 + y) --- (& + w) 








| 0 if y—-wt+l<z, 
" (—1)*'y!y+ti-w)! , _ ™ 
\(ze+w—1)!(y+1—w-—z)! -e-ot+les, 
which may be easily proved by induction on z. 
Next, we have 
_ (m—1)! . _ (0 feo 
L(n, a, b, m) _— a! militss ( 1) e= b (c ae a)! 
_ (m— 1)! < ve po ~ *) (c’ + b)(c’ +b—m)! 
— a! iui ts ( " c’ ~ (c’ ~ b— a)! 
_.(m-1)! S&S - 
—_ (—1)’ =_* im (4) 


c’=max(0,a—b) 


(07 ethos Ot e- 1)(c' +b — m)! 
¢ (c’ +b — a)! 
= (-1)"" =_= {T(n, a, b, m) + (m — 1)T(n, a, b, m + 1)}, 


where 


| _ (?, Sete 
7 » “ ” aii 1) ( c (c+ b — a)!- 


0 if b<n-—-a+m-—l, 
=\(-1)""@-—m+1)!6-—m+))! , 7 7 
. Cy Ei Cy eT si wee 


by the preceding formula. Thus we get the explicit expression for L(n, a, b, m) 


given in formula (2), which is thereby proved. 


The derivations of formulas 3 and 4 are similar to the above and may be 
omitted. 








SYSTEMS OF PROBABILITIES 129 


Now we can give the essential argument for Theorems 1-4. It is evident 
that for any E in &, we have 


P(E) = =p , 


where the summation extends to certain combinations (a) ¢ (v). Substituting 
from formula (1.1) we get Theorem 1; substituting from formula (3) we get 
Theorem 3. Next, for any E in S, we have 


P(E) = Zpja((v)), 


where the summation extends to certain values of a. Substituting from formula 
(1.1), (i) and formula (2) we get Theorem 2; substituting from formula (2), (i) 
and formula (4) we get Theorem 4. We may note these proofs are ‘“‘construc- 
tive’. 

It remains to prove the uniqueness of the coefficients in The -rems 1-4. For 
Broderick’s theorem this has been done by Fréchet’, by introducing “inde- 
pendent events”. Our. proof will be based on the conditions of existence, also 
initiated by Fréchet®, for the systems p;((a)), piy((@)), P2?((v)), Pl ((v)). 

The conditions of existence of the system p,((a@)) have been given by the 
author in the paper’, though the proof there is not quite complete. 


1. Conditions of existence of the system PS ((v)). Given n quantities Q°”, 
1 < a S n; what are the necessary and sufficient conditions that they may be 
the system of P{”((»))’s, 1 < a S n, of a probability function defined over 
G(1,---,n)? 

From formula (1.1), (i) and formula (2) it is evident that necessary conditions 
are,forl San, 


= b—n+a-—1 b 1 
(3) zo (,- 4)e =o 
_- e 2 0, 
and 
(4) yd cor 2 jam +1 - ae a1. 
b ‘ 


The last condition can be re-written as 


. 1 jl - n—a b — a 
2 a Qs" nike (—1) (, ) + : — 1, 


which reduces to the identity 1 = 1. 





* “Conditions d’existence de systéme d’événements associés 4 certaines probabilités,”’ 
Jour. de Math., 1940. However, our interpretation of the term would mean instead ‘‘con- 
ditions of existence of a probability function defined over a Borel set of events, etc.” 








130 KAI LAI CHUNG 


To show that the conditions (3) are sufficient, put 


mort, © Jae, 


be n—a 


1 — Qh”. 


P{a) 


P(0} 
By (3) and (4) we have, forO0 S a Sn, 


Pia) 2 O and 2, Pra = j. 


show that the P{”((v»))’s of this probability function coincide with the given 
Q\’s, so that this is the probability function we seek. We have, 


PPO) = > mG) = Lew, (7-8) 


h=max(1,b—n-+a) b = h 


— = = _1\ornte-1 c mie (*) as \ (1) 
a a 2. ( : a a 2) oe ee h »- ) 7 


Now the series in curl brackets 
n—b 
— _1)\e-"ta-1 c n i = 
- Foon (NG) - Co} 


+ Zora (8) 
= F(a = aE) 


n—b ' 
- Br C59) 
euiithenc ( ) n— b 
If c = n, the last 


-()- Boo ()Ce") 
O)-OEarCs)-$ tz2 


If c < n, we have 
n—b — 
o+(-1" S (-0)* (, : )(” ; " 
.t afc\fa\ fl if b=c; 
(1 2 (-0) (3)(5) - C if bx. 


; 
Hence they are actually the pja)((v))’s of a probability function. We want to 


Therefore 


P3?((»)) = QS”. 











SYSTEMS OF PROBABILITIES 131 


2. Conditions of existence of the system py((a)). Given 2” — 1 quantities 
qu((a)), (a) € (v), a 2 1, what are the necessary and sufficient conditions that 
they may be the system of py)((a))’s, of a probability function defined over 
L(1, ---,n)? 


From formula 3 it is evident that necessary conditions are 


ly “~— ~ayr*( se )" 


TN cml d=max(0,c—a) a + d = 1 
 _ am((y) — (8) + (8) 20, 

(5) (8) « (@)—(a) 

(7)—(8) ¢ (a) 

L1df/n-1\" 
;. x c on ') ote Piuy((y)) = 0; 
and . 
1 4! > > “— (-y"( n—1 )" 

(6) TN (a)e(y) cml d=max(0,c—a) a+d-—1 


beta qui((y) — (8) + (6)) = 1. 


‘7-4 


Consider the sum 


min(e,n—a) a 
__4)4 n-1l1 2 
Xn stile (—1) (, +d- :) salies an ((y) (6) + (6)). 
(y)—(8) e(a) 
For a fixed (6), the number of ways of writing (vy) = (vy) — (6) + (6) is (‘), 


then since (y) — (6) € (a) but (a) — ((vy) — (6)) e€(v) — (y), the number of 
n—c 
a-—-c+dj° 


nm min(c,n—a) afc n-c aw. i -1 
 cadinn "Geo Meta) 


- ~~ 1 —-1 n min(e,n—a) “(5)(° + d ae ') 
— (” = i) 2, sales (“)) d es i 
Therefore the condition (6) reduces to, the identity 1 = 1. 


To show that conditions (6) are sufficient, put the left-hand sides of (5) equal 
to Piay} and py) respectively. Then 


choices of (a) is Thus the coefficient of gi)((y)) in the sum is 


Pita) = i Pita)+(6)] 
(8B) €(¥)—(a) 


eo ES ta a) we 
no GS enncafie~o-0 a+td—1) @d-te) 


md qui((y) — (6) + (6)). 
(5) e(¥)—(a)—(8) 


(7) — (8) e(a) + (6) 


0) 











132 KAI LAI CHUNG 


Let (vy) = (vy) — @) + @), where @) € (a), (vy) — @) €(v) — (a). Then the 
sum in the curl brackets can be written, by a combinatorial calculation, as 


min(a,c) (n—a min(c—f,n—a—b) iil 2 f mews +f nt —1 
2, » pment ( 1) ( d \(; —ct+t+d+ Ma +b+d-— :) } 
qu((y) — (¢) + (¢)). 


(dela 
(y)—(¢) €(»)—(a) 


The sum in the last curl brackets is 


i —1 n—a min(c—f,n—a—b) “ c—f a+b+d-1 
ae 2 ease p-2) (—1y'( d 2) 


Inverting the order of summations, 
om 1 —1 min(c—f,n—a) ac Ky n—a—d (: +4 b + d ai 4 
—1 
(, +e-f- J pa (—}) d Ets a+c-jf-1 
— 1 -1 min(c—f,n—a) P (° 7 n ) 
= ee J siiiillltnasue I) d a+c—f 


“atlas 5-1 Eco’; = ; oo 


8 


o 


Hence (7) reduces to 


n 


Pia) = : DD (-))" fits an((y)). 


c=l 


Then 


S.((a)) = de Pus) = ; of (-1)*" t _ ‘) qin ((5)) 


pu((a)) = 4 (—1)""S,((a)) 
b= 


as are {e (-1)"? (5 7 ‘y qui ((6)) = gmi((@)). 
dys0 


The conditions of existence of the system P!"'((v)), 1 S$ a S n, are similarly 
deduced from formula (3), (i) and formula (4) with m =. 1. . 

Now we can prove the uniqueness of the coefficients in Theorems 1-4. Since 
the proofs are all exactly similar, we take Theorem 2. Suppose, if possible, 
there exists another system of coefficients Ce ,0 Sa Ss ns0 that 


P(E) = @ + 2 GPO) = 6b + D PO). 











SYSTEMS OF PROBABILITIES 133 


Taking the difference, we get a linear polynomial in the variables P{ ((v)), 
1 < a <= n which must vanish: 


(8) i ~ he > (co — of) P((v)) = 0, 


for all “admissible” values of the variables. These values, say Q.”, are precisely 
those which satisfy the conditions (3). 

It is evidently easy to construct a system of Qf”, 1 < a S n, which satisfy the 
conditions (3) written with the sign of strict inequality “>”. Hence in a suf- 
ficiently small neighborhood of the point (Qf, Q$”, --- , Q%) in the n-dimen- 
sional space these strict inequalities still hold. Hence the polynomial vanishes 
in this neighborhood and so must vanish identically; that is, 


C.-C. =0 for OSaKn. Q. E. D. 








ON THE EFFICIENT DESIGN OF STATISTICAL 
INVESTIGATIONS 


By ABRAHAM WALD 
Columbia University 


1. Introduction. A theory-of efficient design of statistical investigations has 
been developed by R. A. Fisher’ and his followers mainly in connection with 
agricultural experimentation. However, the same methods can be applied to 
other fields also. All statistical designs treated in the aforementioned theory 
refer to problems of testing linear hypotheses. By testing a linear hypothesis 
we mean the following problem: Let y:,---, yw be N independently and 
normally distributed variates with a common variance o°. It is assumed that 
the expected value of y. is given by 


(1) E(ya) = Bitia + Betzee +++ + Bptpe (a2 = 1,---, N) 
where the quantities z;.(i = 1, --- , p;a = 1, --- , N) are known constants and 
Bi, °°:, 8», are unknown constants. The coefficients 6,,---,8, are called 
the population regression coefficients of y on 2; , 42, --- , and z,, respectively. 
The hypothesis that the unknown regression coefficients 6: , --- , Bp satisfy a 
set of linear equations 

(2) gahit---+98p=g (= 1,---,r;7r Sp), 
is called a linear hypothesis. The problem under consideration is that of testing 
the hypothesis (2) on the basis of the observed values y1, --- , yw. 


In many cases the experimenter has a certain amount of freedom in the choice 
of the values z;.. The efficiency of the test is greatly affected by the values of 
Zia. The statistical investigation is efficiently designed if the values z;. are 
chosen so that the sensitivity of the test is maximized. Let us illustrate this 
by asimple example. Suppose that z and y have a bivariate normal distribution 
and we want to test the hypothesis that the regression coefficient 8 of y on z 
has a particular value 6). Suppose, furthermore, that the test has to be carried 
out on the basis of N pairs of observations (x, , y:), --- , (tv, yw), where the 
experiments are performed in such a way that 2, --- , y are not random vari- 
ables but have predetermined fixed values. It is known that the variance of 

N 


the least square estimate b of 8 is inversely proportional to Zz. (tq — 2)* where 
a=] 


= = (11 + --- +2y)/N. Hence, if we can freely choose the values 2; , --- , ty 
in a certain domain D, the greatest sensitivity of the test will be achieved by 
choosing z;, --- , ty so that (2. — 2£)* becomes a maximum. 


In the next section we will introduce a measure of the efficiency of the design 


1 See for instance R. A. Fisner, The Design of Experiments, Oliver and Boyd, London, 
1935. 


134 











EFFICIENT DESIGN 135 


of a statistical investigation for testing a linear hypothesis. In sections 3 and 4 
it will be shown that some well known experimental designs, used widely in 


agricultural experimentation, are most efficient in the sense of the definition 
given in section 2. 


2. A measure of the efficiency of the design of a statistical investigation for 
testing a linear hypothesis. The hypothesis (2) can be reduced by a suitable 
linear transformation to the canonical form 


(3) A = h=---=6,=0, (7 <p). 


Hence, we can restrict ourselves without loss of generality to the consideration 
of the hypothesis (3). 


Denote 2 LiaXja by a;; and let the matrix || c;; || be the inverse of the matrix 


lla; || @, 9 = 1,-+-,p). Denote by 6b; the least square estimate of 
B; (« = 1,---,p). It is known that the estimates b,, --- , b, have a joint 
normal distribution with mean values #1, --- , Bp, respectively. It is further- 


more known that the covariance of b; and b; is equal to co". The statistic used 
for testing the hypothesis (3) is given by 


N = > Bim bi Dm 





(4) F an = 2 = m= 
r 2 
a (Ya — bidia -+* — by Tpa) 
where || aim || is the inverse of || cim || ((, m = 1,---, 7). The statistic F 


has the F-distribution with rand N — p degrees of freedom. The critical region 
for testing the hypothesis (3) is given by the inequality 


(5) F2 Fy, 


where the constant F is determined so that the probability that F > Fo (cal- 


culated under the assumption that (3) holds) is equal to the level of significance 
we wish to have. 


It is known that the power function’ of the critical region (5) depends only 
on the single parameter 


(6) r — a 4. > Gin BiBm - 


a- | 


Furthermore this power function is a monotonically increasing function of X. 


The coefficients a7m are functions of the quantities m4 (§@ = 1,--:-,ppa = 
, N). The choice of the values z;. (¢ = 1,---,p; a = 1,---.N) is 


the better the greater the corresponding value of A. If r = 1, the expression A 


2 See for instance P. C. Tana, “‘The power function of the analysis of variance tests,”’ 
Stat. Res. Mem., Vol. II, 1938. 











136 ' ABRAHAM WALD 


reduces to * a3:8;. Hence, if r = 1, we maximize by maximizing at,. Since 


a3; = 1/cy , we maximize \ by minimizing ¢, . Thus, ifr = 1, we can say that 
we obtain the most powerful test by minimizing cy , i.e. by minimizing the 
variance of b,. If r > 1, the difficulty arises that no set of values 
Lia (it =1,---, pja=1, --- , N) can be found for which \ becomes a maximum 
irrespective of the values of the unknown parameters 6, -:-,8,. Hence, if 
r > 1, we have to be satisfied with some compromise solution. For this purpose 
let us consider the unit sphere 


(7) B+ --- +67 =1, 
in the space of the parameters 6,,---,8,. It is known that the smallest root 
in p of the determinantal equation 
- * + 
a1 — Pp Qy2 mine ai, 
* . * 
(8) a2 a Qe, = 0, 
* * — 
Gri ase ‘o* te = 2 | 


is equal to the minimum value of oA on the unit sphere (7). Similarly the 
greatest root of (8) is equal to the maximum value of o”\ on the sphere (7). The 
compromise solution of maximizing the smallest root of (8) seems to be a very 
reasonable one. However, for the sake of certain mathematical simplifications, 
we propose to maximize the product of the r roots of (8). Since the product of 
the roots of (8) is equal to the determinant 


* + 
@11 °** Gy 
(9) , “ae e Ae 
* a 
j Ay. °° * Ary 


we have to maximize the determinant (9). The value of the determinant 
| cim | (l, m = 1, --- , r) is the reciprocal of that of (9). Hence we maximize 
(9) by minimizing the determinant | c;, |. The generalized variance of the set 
of variates b, , --- , b, is equal to the product of o” and the determinant | crm | . 
Thus, our result can be expressed as follows: The optimum choice of the values 
of zi. is that for which the generalized variance of the variates b,,---, 0, 
becomes a minimum. 

Any set of pN values via (¢ = 1, --- , p} a = 1, --- , N) can be represented 
by a point in the pN-dimensional Cartesian space. Denote by D the set of all 
points in the pN-dimensional space which we are free to choose. If N is fixed 
and if any point of D can be equally well chosen, the following two definitions 
seem to be appropriate: 

DEFINITION 1. Denote by c the minimum value of the determinant | Cim | 
(l,m =1,---,r)inthedomain D. Then the ratio c/| Cim | is called the efficiency 
of the design of the statistical investigation for testing the hypothesis (3). 

DEFINITION 2. The design of the statistical investigation for testing the hypothesis 
(3) is said to be most efficient if its efficiency is equal to 1. 











oC = = 


=r 


d 
HH 
d 


1s 


n 


18 


EFFICIENT DESIGN 137 


3. Efficiency of the Latin square design. A widely used and important 
design in agricultural experimentation is the so-called Latin square. Suppose we 
wish to find out by experimentation whether there is any significant difference 
among the yields of m different varieties v,,---,vm. For this purpose the 
experimental area is subdivided into m’ plots lying in m rows and m columns 
and each plot is assigned to one of the varieties :, --: ,Um. If each variety 
appears exactly once in each row and exactly once in each column, we have a 
Latin square arrangement. Denote by yj; the yield of the variety », on the 
plot which lies in the 7-th row and j-th column. The subscript k is, of course, 
a single valued function of the subscripts 7 and j, since to each plot only one 
variety is assigned. The following assumptions are made: the variates y; x 
are independently and normally distributed with a common variance o° and the 
expected value of y;;. is given by 


(10) E(yin) = wi + 45 + pr. 


The parameters o°, u;, vj and pz are unknown. ‘The hypothesis to be tested 
is the hypothesis that variety has no effect on yield, i.e. 


(11) pi = po = +++ = pe. 


We associate the positive integer a(i, 7) = (¢ — 1)m + 3 with the plot which 
lies in the 7-th row and j-th column. (i, 7 = 1,---,m). It is clear that for 
any positive integer a < m” there exists exactly one plot, i.c. exactly one pair 
of values 7 and j, such that a = a(z,j). In the following discussions the symbol 
Ya (a = 1, «+: , m’) will denote the yield y;;, where the indices i and j are de- 
termined so that a(i, j) = a. The plot in the i-th row and j-th column will be 
called the a-th plot where a = a(i, j). 

We define the symbols tia, Uja, Zea (i, j, k = 1,-+:,mj;a@ = 1,-+:-,m’), 
as follows: tia = 1 if the a-th plot lies in the i-th row, and t;2 = 0 otherwise. 
Similarly uj. = 1 if the a-th plot lies in the j-th column, and u ja = 0 otherwise. 
Finally z.2 = 1 if the k-th variety is assigned to the a-th plot, and za = 0 
otherwise. Then equation (10) can be written as 


E(Ya) = wha + °** + bmlma + Mita + °° 
+ Vmtlma + Pi@1a + aes + PmZma + 


(12) 
1 m2 m2 1 m2 
Denote the arithmetic means me x lias SG = Uia; and — = a Zia by ti, @; and 


2; respectively. Let tia = tia _ i, Uia = Uia — Ui, Mia = Zia — 2, Wi = 
Ui — Mm, ¥; = Vi — Ym and 0: = pi — pmfort = 1,---,m—1. Let further- 
more we = 1 fora = 1,---,m’. Then we have 


tia=tat twas Wie =Uiat Ti Wa; ia = Zia +H Wa} 
(@=1,--+,m— 1) 
(13) 4 tee = (1 — & — +++ — Inswe — the — *°* — betes 
= (1 — th — +++ — thy a)Wa — Wie — 18° — Una 


= o , , 
Zma = (1 — % — + °° — 2n-1)We — Zia — *** — Sunde 











138 
From (12) and (13) we obtain 
m—1 m—1 m—1 

(14) E(ya) = Wa + d uctia + dX Vi Uia + 2d pila 
where 

m—1 . m—1 ‘ m—1 . 

B= mkt D vit + DL ph + um +m + Pm- 

t=1 f=1 t=] 

The hypothesis (11) can be written as 


(15) pi = pr = *** = pet = O. 


This is a linear hypothesis in canonical form as given in (3). The values Zia 
(¢ = 1, ---,m — 1;a@ = 1, --- ,m’) depend on the way in which the varieties 
v1, °**, Um are assigned to the m’ plots. We will show that we obtain a most 
efficient design if we distribute the varieties over the m’ plots in a Latin square 
arrangement, i.e. if each variety appears exactly once in each row and exactly 
once in each column. 


Let Qia = Wa, Qittrae = be (i = i, eS a Se 1), Qm+jia = Uja, Gj co 

1,--+,m — 1) and Qem-tit.a = Zea (kK = 1,--+,m— 1). Denote Dd GiaGe 
a=1 

by a,; (i,j = 1, 2, --- , 3m — 2) and let the matrix || c,; || be the inverse of the 

matrix || a;; || (¢, 7 = 1,---,3m — 2). Let us denote by A the determinant 

|a;| @, 7 = 1,---,3m — 2), by A; the determinant |a;| (, j = 

1, --- , 2m — 1), by A, the determinant | a,;| (7, 7 = 2m, --- ,3m — 2) and 


A; the determinant | cy | (¢, 9 = 2m, --- ,3m — 2). We have to show that for 
the Latin square arrangement A; becomes a minimum. From a known theorem’ 
about determinants it follows that 

(16) As = Ai/A. 


Hence, we have merely to show that A/A, becomes a maximum for the Latin 
square arrangement. Denote by A, A; and A, the values taken by A, A; and 
A: , respectively, in the case of a Latin square arrangement. Since, for the Latin 
square arrangement, as is known, 


a8 at m2 
D tatja = 2, tralia = > zieWe = 0 (i,j,k =1,---,m—1) 
a=l a=l _ axl 


we have 


A 
(17) 5, = 





Since the matrix || a@;;|| (¢, 7 = 1,---,3m — 2) is positive definite we have 


A 
(18) x, 34. 


* See M. BécueEr, Introduction to Higher Algebra, 1931, pp. 31. 





EFFICIENT DESIGN 139 


Because of (17) and (18) the Latin square design is proved to be most efficient 
if we show that A. < A,. 


Denote by Az the m-rowed determinant |ay| (7 = 1, 2m, 2m + 1,---, 
3m — 2). Since a = 0 for 7 ¥ 1, we have 


(19) As = ayd, = mAs. 
m2 

Denote a Ziatja by by (t,7 = 1,-++,m). Then 

by = 0, for tj 


and bis = N;, 


where N; denotes the number of plots to which the variety v; has been assigned. 
Because of (20) we have 


(20) 


bu — Ve Dim 
(21) DT | = Mi Na ++ New. 
Dm Damn 
According to (13) we have 
Zat2We=Zia, G=1,---,m—1) 
(22) , ’ 
—Gig -— «++ = Cate + We(1 —_ zy eae Zm—1) = 2ma- 


The determinant of these equations is given by 


1 0 0 --+- O 0 zy 
0 1 0 --- O 0 22 
(23) Aa! . : lh. Neti nt ; : 
0 0 0 --- O St Ree 
—1 -1 -1 --- -1 -1 6 
where6 = 1—%,—2%— --- —Zm4. It is easy to verify that 
(24) A= 1, 
From (21), (22) and (24) it follows that 
(25) A? = NiN2-:- Na. 
Hence, from (19) we obtain 
(26) Ao = NiN2 --- N,,/m’: 
In the case of a Latin square design we have N; = N2 = --- = Ny» =m. Hence 
(27) a, = m™”. 
Because of the condition N; + Ne + --- + Nm = m’, the right hand side of 
(26) becomes a maximum when N; = Nz = --- = Nw = m. Thus & < A; 


and consequently the Latin square design is proved to be most efficient. 








140 ABRAHAM WALD 


4. Efficiency of Graeco-Latin and higher squares. Consider m varieties 
V1, °°* , Um and m treatments q1,---,@Qm Suppose that we wish to find out 
by experimentation whether the yield is affected by varieties or treatments. 
For this purpose the experimental area is subdivided into m’ plots lying in m 
rows and m columns and to each plot one of the varieties and one of the treat- 
ments is assigned. We call this arrangement a Graeco-Latin square if the follow- 
ing conditions are fulfilled: 1) each variety appears exactly once in each row and 
exactly once in each column; 2) each treatment appears exactly once in each row 
and exactly once in each column; 3) each variety is combined with each of the 
treatments exactly once. 

The following general abstract scheme includes the Latin square and Graeco- 
Latin square as special cases: Consider an r-way classification with m classes in 
each classification. Denote by ya,a,---c, the value of a certain characteristic of 
an individual who is classified in the a;-class of the first classification, in the a,- 
class of the second classification, --- , and in the a,-class of the r-th classifica- 
tion. Suppose that m’ observations are made for the purpose of investigating 
the effect of the classes on the value of the characteristic under consideration. 
We will say that we have a generalized Latin square design if the following con- 
dition is fulfilled: Let 71, j7, m’ and m” be an arbitrary set of four positive integers 
for which i ¥ j,i < r,j <r, m’ < mandm” <m. Then among the m’ indi- 
viduals observed there exists exactly one individual who belongs to the m’-class of the 
1-th classification and m''-class of the j-th classification. 

It is clear that if r = 3 the above scheme is a Latin square. If r = 4 we have 
a Graeco-Latin square. 

Assume that the observations yq,...c,(@1, @2, °°: , @ = 1, +--+, m) are nor- 
mally and independently distributed with a common variance o. Assume 
furthermore that the expected value of yz,...c, is given by 


E (Ye,e.~-«,) = Tin + +: >} Yra,; - 


The parameters o° and yie (i = 1,---, 17; a@ = 1, ---, m) are unknown con- 
stants. Suppose that we wish to test the hypothesis that 


(28) Ya = Yo = +++ = Yim. 


It can be shown that if the number of observations is limited to m’, we obtain a 
most efficient design by constructing a generalized Latin square. The proof of 
this statement is similar to that of the efficiency of the Latin square and is 
therefore omitted. 





ae 








SOME SIGNIFICANCE TESTS FOR NORMAL BIVARIATE 
DISTRIBUTIONS 


By D. S. Virttars anp T. W. ANDERSON 
United States Rubber Company, Passaic, New Jersey, and Princeton University 


1. Introduction. In the theory of linear regression of y on x where y is nor- 
mally distributed about a linear function of z, say v + Bx, where z is a “fixed” 
variate, the t-test for the hypothesis that 6 is zero (that y is distributed 
about v; independent of x) is well known. In this paper we apply some general 
statistical theory to the similar problem where z and y are jointly normally 
distributed. This case is commonly known as the case of “error in both vari- 
ates.” We derive a criterion for testing the hypothesis that the population 
means are the coordinates of a specified point when the ratio of the variances 
and the population correlation coefficient are known. When the ratio of vari- 
ances is known, a criterion is derived to test whether the correlation coefficient 
is zero. 


2. The means. Let us consider a sample of n pairs of observations (x , y: ; 
Ie, Y23°** 3 2n, Yn) from a normal bivariate population. Let the variances of 
z and of y be o; and a, , respectively; and the correlation coefficient, say p, be 
zero. Suppose the ratio of the weight of y to the weight of z, say y = w,/w. = 
o:/o,, is known although the variances are not known. It is clear then, that 
Vyy has variance oz. Since the observations yi (¢ = 1, 2, ---, n) can be trans- 
formed into revised observations ~/y yi = Yi, we lose no asia by assuming 
that x and y are both distributed with variance o’. 

Under the assumption of equality of variances and independence of variates 
we shall derive a criterion for testing the null hypothesis that each observation 
2; is of a variate distributed about the same population mean yz and each observa- 
tion y; is of a variate distributed about the same population mean »v. The 
hypothesis may be stated symbolically as: 


Ho:E(z)= 4, Ey) = 


given os = o, = o andp = 0. We can write 


> a — 


t=] 


n(z — u)° +S8:, 


L (ue — y= nG- +8, 
where 
ialt 7n1t 
“ane”? gras”: 
S, = ~ (4; — 2) . , = 9)’. 





142 D. S. VILLARS AND T. W. ANDERSON 


Then n(Z — u)*/o’ and n(g — v)*/o’ are each distributed independently as x’ 
with one degree of freedom and each of S,/o’ and S,/o’ follow the x’-law with 
n — 1 degrees of freedom. If we define 

(1) r=VE-—pt+G-, S-=S:+S, 


then nr’/o’ and S,/o° have independent x’-distributions with 2 and 2n — 2 de- 
grees of freedom, respectively. 
It follows from this that 


@ R= m/s =n - DE = nln -1) 


has the F-distribution with 2 and 2n — 2 degrees of freedom. 
Let us define F, so 


(z 


(3) [ he»n-2(F) dF = a, 


where he 2n_2 (F) is the F-distribution with 2 and 2n — 2 degrees of freedom and 
0<a<1. Then theprobability is a that the sample statistic R is greater than 
or equal to F,, ie., 


(4) P{R > F.} = a. 


In considering a sample value of R, at significance level a, one rejects the hy- 
pothesis of the means being u and », respectively, if R is larger than F,, i.e., 
larger than 1 and larger than the a significance point in Snedecor’s tables [1]. 

This F-test is a straightforward generalization to the bivariate case of the 
usual f-test as applied to the univariate case. In each case the sum of squares 
of distances of the observations from the population mean is broken up into the 
sum of squares of distances from the sample mean plus n times the square of the 
distance from the sample mean to the population mean. The é-test for the uni- 
variate case depends on the ratio of the distance of the sample mean from the 
population mean to the square root of the sum of squares of distances from the 
observations to the sample mean. The proposed F-test depends upon the ratio 
of the square of the distance of the sample mean from the population mean to 
the sum of squares of distances from the observations to the sample mean. 

It can easily be shown that the likelihood ratio criterion for this hypothesis is 


(5) 


[Se@-a+Du-af r ‘ 
i nrreoreenill eer etren -{1+4,]. 
[Bea hen] 


The hypothesis considered here is one of a class of hypotheses treated by Kolod- 
ziejezyk [2] in a paper in which he considers the likelihood ratio criterion for a 
set of general linear hypotheses. 

Equation (4) may be written 








Oo &8 @ 


oO 8 & 


is 





SIGNIFICANCE TESTS 143 









(6) P{(@— nu) +@—- vr) > re} =a, 


where rz = F. (Sz + S,)/[n(n — 1)]. The probability is a that the distance 
from the sample means Z, 7 to the population means y, » is greater than or equal 
to fa. We may call r. the fiducial radius [3], and the equation (¢ — yu)’ + 
(g — v)” = r%, defines the confidence region for the population means. 

Suppose we have two samples of n; and nz pairs of observations, respectively, 
from normal bivariate distributions. If the population mean of each x variate 
is » and the population mean of each y variate is v, the population variance of 
each variate is o, and the correlation coefficient is zero, then the sample means 
Z, and % of the first sample and Z, and gj of the second sample follow normal 
distributions. Also %, — Z and §, — ge are normally distributed. Then 
r? = mm/(m + n)[(% — #) + (G1 — G)"\/o° has the x’-distribution with 
2 degrees of freedom. Let 

ni ni n2 n2 
Si. _ 2» (a — i)" + d (nis — hh) ss dX (ta; — in)” + 2 (ye: — G2)’, 
where 21; , Yis ( = 1, 2, --- , m) are the pairs of observations in the first sample 
and 2X2; , yoi (¢ = 1,2, --~+ , m2) are the pairs of observations in the second sample. 
S;./o° is distributed according to the x’-distribution with (2n, + 2n. — 4) de- 
grees of freedom because it is the sum of quantities independently distributed 
asx. Then 


R = miner” S; _ ™M(m + Mm — 2)r” 
2(ni + N2) a? (2n; + 2n2 = 4)o? (ny + n2)S, 

has the F-distribution with 2 and (2m, + 2n. — 4) degrees of freedom. This 
fact yields us a significance test for the hypothesis that both the means of the 
x variates and the means of the y variates for the two populations are the same. 
We can also set up confidence regions for uw: — we and », — ». 

Now let us consider a sample from a normal bivariate population with means 
u and v, variances o2 and a, and correlation coefficient p. Suppose y = o:/o, 
and p are known. The transformation 


“ p= Vitee +Vin-ey 
V2 
_ Vitor - Viney 
V2 4 
gives us the variates x’ and y’ which are distributed independently and with 
variance oz. Applying the results above we see that 


Ow -—sf +o -—vP 


y 


R=n(n-1) 73 - 
LY @i- 2 +2 w-gy? 
(9) 


m~-_O ee ee te 


¥ (a - 2 - V7 De — Dw - D+7D W- 9 





144 D. S. VILLARS AND T. W. ANDERSON 


has the F-distribution with 2 and 2n — 2 degrees of freedom. From this we 
derive significance tests, fiducial radii, and confidence regions as before. 

The above distributions, significance tests, and confidence regions are easily 
generalized to multivariate normal distributions. Suppose we have a sample of 
n k-tuples of observations {z;.} (¢ = 1, 2,---,k;a = 1, 2,---,n) froma k- 
variate normal distribution. Let the expected value of each variate x; be zero 
(i = 1, 2, --- , k), the variance of each variate be o’ and each correlation co- 
efficient be zero. Then 


n(n — 1) i. a 
> Dd (te — 4) 


t=] a=l 


(10) R” 


has the F-distribution with k and k(n — 1) degrees of freedom. Significance 
tests, confidence regions, and fiducial radii follow from this fact. 


3. Linear Regression. If one has a sample of n pairs of observations (1 , y: ; 
2, Y2;°** 3 Xn, Yn) from a normal bivariate population and wishes to fit a 
straight line to the scatter of sample points, one fits the line in such a way that 
the sum of squares of distances from the sample points to the line is a minimum 
(“error in both variates”’). 

It is easily shown that this line goes through the point whose coordinates are 
the sample means (Z, 7.) If the slope of a line through (2, 9) is tan 6, the dis- 
tance from a sample point (2; , y;) to the line is (x; — Z) sin 6 — (y; — 9) cos 
6. The sum of squares of distances from sample points to the line is 


sin’ @ S, — 2 sin 6 cos 6 S., + cos @S,, 


where 


n 


Sx = Lo (2%: — #)(ys — 9). 


i=l 
If we minimize the above expression with respect to @ we find 


(11) b = tan 9 = Sv — Se + V(S, — S.)* + 485, 


Using the plus sign gives us S, , the minimum sum of squared distances; using 
the minus sign gives us S, , the maximum sum of squared distances. (The latter 
value of tan @ is the negative reciprocal of the former.) 

S, is the sum of squared distances perpendicular to the regression line and 
S, is the sum of squared distances along the regression line. The sum S, + S, 
is equal to S, + S, which is the sum of squares of distances from the sample 
points to the point Z, 7. We have thus decomposed S, + S, into two compon- 
ents, one perpendicular to the regression line and the other along the regression 
line. 








_ SIGNIFICANCE TESTS 145 









The joint distribution of S, and S, may be derived from the Wishart distribu- 
tion of the sums of squares and cross products,’ 


1 Sz Sy 
4no*—T'(n — 2) |S., S, 


Let us make the transformation 





4(n—4) 
e SztSy) le? 


(12) 















S. = cos’ 6S, + sin’ 0 S,, 
S, = sin’ @S, + cos’ 6S,, 
S., = sin 6 cos 6 (S, — S,). 
The value of 6 corresponds to the plus sign in (11). We find 
S.+ S,= 8, + S., 


Se Sey 
Sey Sy 


The Jacobian of the transformation is (S, — S,). Using these relations in (12) 
and integrating out @ we derive the distribution of S, and S, 























= S,S,. 





























‘ ie S.S»p — —H(Sa+S,)/e? 
(13) 4e*T(n — 2) of ) é (Sa S,). 

It can be shown that S, and S, are the characteristic roots of the sample vari- 
ance-covariance matrix. The distribution (13) of the characteristic roots of a 
variance-covariance matrix when the population correlation coefficient is zero 
and the variances are equal has been demonstrated by P. L. Hsu [4]. 


As a test of correlation (i.e., test of significance of the regression coefficient) 
we propose using the ratio 


























F’ = S,/S». 


This ratio is the maximum ratio of the sum of squared deviations in one direction 
to the sum of squared deviations in the perpendicular direction. It is intuitively 
evident that this ratio is probably near unity if the null hypothesis is true, that 
is, if the variances are equal and the correlation is zero. If the correlation 
is not zero then the ratio is likely to be large. 

From (13) we can deduce the distribution of F’ by transforming variables 
and integrating out the extraneous one. This procedure yields us as the dis- 
tribution of F’ 






































(n — 2)2"3F "(FY + 1)" (F’ — 1). 


If we make the transformation 








, 
F’ =e, 








1 This distribution is equivalent to Fisher’s distribution of the sample variances and 
correlation coefficient when the population correlation coefficient is zero. 





} 











146 D. S. VILLARS AND T. W. ANDERSON 


we find the probability element of z’ to be 
(n — 2)(cosh z’)~‘"~” d(cosh z’) 


1 — (cosh 2’). 


Critical values of 2’ for various levels of significance may be determined from a 
table of hyperbolic cosines. Table I gives some values of 2’ and the corre- 
sponding values of F’. 


TABLE I 
Percentage points for the z’ (or F’) distribution 


After integrating we see the cumulative distribution of 2’ is | 
i 
2’ F’ 














3 3.688 | 5.298 | 7.601 | 98.0 (398 1600 40,000 (4,000,000 

4 2.178 | 2.993 | 4.144 | 17.9 | 38.0 78.0 398 4,000 

5 1.656 | 2.216 | 2.993 | 9.59 | 16.5 27.4 84.2 398 

6 1.381 | 1.818 | 2.412 | 6.79 | 10.6 15.8 38.0 124 

7 1.207 | 1.572 | 2.059 | 5.43 | 7.92) 11.2 23.2 61.4 

8 1.084 | 1.402 | 1.818 | 4.63 | 6.47 8.74 16.5 38.0 

9 -992 | 1.276 | 1.643 | 4.09 | 5.55 7.28 12.7 26.8 
10 -920 | 1.178 | 1.509 | 3.71 | 4.91 6.30 10.6 20.5 
ll -862 | 1.100 | 1.402 | 3.43 | 4.45 5.61 9.02 16.5 
12 -813 | 1.035 | 1.314 | 3.21 | 4.10 5.09 7.92 13.9 
13 772 | .980 | 1.241 | 3.03 | 3.82 4.68 7.10 12.0 
14 .736 | .933 | 1.178 | 2.89 | 3.59 4.36 6.47 10.6 
15 .705 | .892 | 1.124 | 2.76) 3.41 4.10 6.00 9.47 
20 593 | .746 993 | 2.36; 2.81 3.27 4.45 6.47 
25 522 | .654 814} 2.13 | 2.48 2.84 3.70 5.10 
30 471 .589 732 | 1.98 | 2.28 2.57 3.25 4.32 
40 402 | .502 621 | 1.80 | 2.02 2.23 2.73 3.47 
60 324 | .404 498 | 1.61 1.76 1.91 2.24 2.71 
120 226 | .281 345 | 1.39 1.49 1.57 1.75 2.00 





The use of F’ has been suggested here to test the hypothesis that the popula- 
tion correlation coefficient is zero when it is known that the variances of the two 
variates are the same, or, more generally, when the ratio of the two variances is 
known. This gives a test of significance of the regression coefficient when there 
is error in both variates if the ratio of the variances is known. The test arises 
from intuitive considerations. F’ can also be used to test the hypothesis that 
p = Oand o2 = oy (Hy in Hsu’s paper). C.T. Hsu [5] and J. W. Mauchly [6] 
have shown that the likelihood ratio criterion for this hypothesis is 


= or) - ler) 












La 


SIGNIFICANCE TESTS :, . 147 
If we set the normal distribution function equal to a constant, we determine 
a contour ellipse in the z, y — plane. Since these ellipses of constant probability 
density are circles when p = 0 and «2 = o, , Mauchly calls the test a test of circu- 
larity. The same procedure as used to test whether these ellipses are circles can 
be used to test whether the ellipses have major axes in a certain direction and 
with a specified ratio of lengths of axes. Suppose we wish to test the hypothesis 
that the major axis is inclined to the x axis at an angle @ and that the ratio of 
lengths of the major axis to the minor axis is k. This is equivalent to the hy- 
pothesis that p = po and oz = yoo, . Todo this we rotate coordinate axes of the 
variables of the distribution (hence changing coordinates of all sample points): 
through @ and change the scale of one of the new variables by the factor of k. 
The transformation is 


x = kz’ cos 6 — y’ sin 8, 
y = kz’ sin 6 + y’ cos @. 


In terms of 2’, y’ the null hypothesis is p’ = 0, oz: = oy, and one proceeds as: 
above. Of course, if yo is known then this method can be used to test the null 
hypothesis that p = po. 


4. Illustrative Example. An application of the formulae given above may be 
illustrated from the data in Table II, which gives two sets of electrical conductiv- 
ity measurements at different field strengths. The assumption that the two 
variances are equal is thus reasonable. 


Table of Pairs of Observations of Electrical Conductivity 


zy Ys ty YE 

5.0 5.1 5.5 5.1 
7.4 7.0 5.3 5.0 
7.0 7.7 4.7 4.4 
8.8 7.7 8.6 7.1 
7.8 6.8 7.5 7.3 
5.1 5.5 5.6 6.3 
6.6 7.4 7.4 6.5 
8.8 7.7 


Is it reasonable to regard z and y as being independently distributed in the 
population on the basis of these data? 

The sums of squares and cross products of deviations from the means and the 
calculated slope are: 


29.40, Sey = 19.99, 
Sy = 18.04, b= 0.7554. 


















148 , D. S. VILLARS AND T. W. ANDERSON 


' 


The maximized variance ratio is: 


r= S. + 2bS., + 0'S, _ 69.89 
BS, — 2bSy +S, 4.615 


= }InF’ = 1.36. 


Comparing with Table I for n = 15 we find this value of 2’ very highly sig- 
nificant (probability less than 0.001), and at this probability level and on basis 
of our data, z and y cannot be considered to be independent in the population. 

Since the regression is significant, it becomes of interest to compute the calcu- 
lated points X; and Y; which fall on the regression line 


Y = 1.35 + 0.7554 X, 


= 15.15. 





corresponding to each observed point z;, y;. They are obtained from these 
equations 


oes 
1 +6 


A8lz; + .363y; + .86, 


¥i=G+—3@-a)+ (ys — 9) 





b 
1+ 6 




























Xi 


i+ i _- D+ 


.637z; + .48ly; — .65. 


b _ 
ize % — 9) 











The minimized sum of squared deviations from the regression line (i.e., squared 
distances between observed and calculated points) is thetdenominator of the 
expression for F’ divided by the factor (1 + 5°), 


4.615/.5706 = 2.64. 


It should perhaps be pointed out that the tests of the means described in the first 
part of this paper are no longer applicable since we do not know the population 
correlation coefficient. 


REFERENCES 


{1] G. W. SnepeEcor, Statistical Methods, Iowa State College Press (1940), pp. 184-187. 

(2) S. Ko.topzrEsczyx, “‘On an important class of statistical hypotheses,’’ Biometrika, 
Vol. 27 (1935), pp. 161-190. 

[3] R. A. Fisuer, ‘‘The fiducial argument in statistical inference,’’ Annals of Eugenics, 
Vol. 6 (1935), pp. 391-398. 

[4] P. L. Hsv, ‘“‘On the distribution of roota of certain determinantal equations,’’ Annals of 
Eugenics, Vol. 9 (1939), pp. 250-258. 

[5] C. T. Hsu, ‘‘On samples from a normal bivariate population,”’ Annals of Math. Siat., 
Vol. 11 (1940), pp. 410-426. 

[6] J. W. Maucuty, “Significance test for sphericity of a normal n-variate distribution,” 

Annals of Math. Stat., Vol. 11 (1940), pp. 204-209. 








ed 
he 


rst 
on 





SYMMETRIC TESTS OF THE HYPOTHESIS THAT THE MEAN OF 
ONE NORMAL POPULATION EXCEEDS THAT OF ANOTHER 


By Herspert A. Simon 
Illinois Institute of Technology 


1. Introduction. One of the most commonly recurring statistical problems is 
to determine, on the basis of statistical evidence, which of two samples, drawn 
from different universes, came from the universe with the larger mean value of a 
particular variate. Let M, be the mean value which would be obtained with 
universe (Y) and M, be the mean value which would be obtained with universe 
(X). Then a test may be constructed for the hypothesis’ M, > M.. 


If x1, --- , 2, are the observed values of the variate obtained from universe 
(X), and y,, --- , yn are the observed values obtained from universe (Y), then 
the sample space of the points E:(z,, --- ,2%n 349i, °** » Yn) May be divided into 


three regions w), w,, and w.. If the sample point falls in the region wo , the 
hypothesis M, > M, is accepted; if the sample point falls in the region w, , the 
hypothesis M, > M, is rejected; if the sample point falls in the region w:, 
judgment is withheld on the hypothesis. Regions wo , w; , and w: are mutually 
exclusive and, together, fill the entire sample space. Ary such set of regions 
wp , @; , aNd w, defines a test for the hypothesis M, > M.. 

In those cases, then, where the experimental results fall in the region we , the 
test leads to the conclusion that there is need for additional data to establish a 
result beyond reasonable doubt. Under these conditions, the test does not 
afford any guide to an unavoidable or non-postponable choice. In the applica- 
tion of statistical findings to practical problems it often happens, however, that 
judgment can not be held in abeyance—that some choice must be made, even at 
a risk of error. For example, when planting time comes, a choice must be made 
between varieties (X) and (Y) of grain even if neither has been conclusively 
demonstrated, up to that time, to yield a larger crop than the other. It is the 
purpose of this paper to propose a criterion which will always permit a choice 
between two experimental results, that is, a test in which the regions wa and w: 
fill the entire sample space. In the absence of a region w2 , any observed result 
is interpreted as a definite acceptance or rejection of the hypothesis tested. 


2. General characteristics of the criterion. Let us designate the hypothesis 
M, > M. as H, and the hypothesis M, > M, as H,. Then a pair of tests, To 
and 7, , for Ho and H, respectively must, to suit our needs, have the following 
properties: 

(1) The regions wo (wo is the region of acceptance for Ho , wi the region of 
rejection for Ho ; wou and wy the corresponding regions for H,) and w, must 


1 This paper presupposes a familiarity with the theory of testing statistical hypotheses as 
set forth by J. Neyman and E. S. Pearson [1]. 


149 








150 HERBERT A. SIMON 

coincide; as must the regions wip and wn. This correspondence means that when 
Hg is accepted, H, is rejected, and vice versa. Hence, the tests TJ) and 7; are 
identical, and we shall hereafter refer only to the former. 

(2) There must be no regions wz and w2,. This means that judgment is never 
held in abeyance, no matter what sample is observed. 

(3) The regions wo and w:) must be so bounded that the probability of accept- 
ing H, when Hy is true (error of the first kind for TJ) and the probability of 
accepting Ho when H, is true (error of the second kind for 7>) are, in a certain 
sense, minimized. Since Hy and H, are composite hypotheses, the probability 
that a test will accept H; when Hp is true depends upon which of the simple 
hypotheses that make up H, is true. 

Neyman and Pearson [2] have proposed that a test, JT. for a hypothesis be 
termed uniformly more powerful than another test, 7's , if the probability for T. 
of accepting the hypothesis if it is false, or the probability of rejecting it if it is 
true, does not exceed the corresponding probability for 77, no matter which of 
the simple hypotheses is actually true. Since there is no test which is uniformly 
more powerful than all other possible tests, it is usually required that a test be 
uniformly most powerful (UMP) among the members of some specified class 
of tests. 


3. A symmetric test when the two universes have equal standard devia- 
tions. Let us consider, first, the hypothesis M, > M, where the universes from 
which observations of varieties (X) and (Y), respectively, are drawn are nor- 
mally distributed universes with equal standard deviations, ¢, and means M, and 
M, respectively. Let us suppose a sample drawn of n random observations from 
the universe of variety (X) and a sample of n independent and random observa- 
tions from the universe of (Y). The probability distribution of points in the 
sample space is given by 


1 
(1) P(1, °°* > Bn3 ry °°" Yn) = (Qnot) "emi? oie s wut] ; 

In testing the hypothesis M, > M,, there is a certain symmetry between the 
alternatives (X) and (Y). If there is no a priori reason for choosing (X) rather 
than (Y), and if the sample point E,: (a; , --- , @n; b: , --- , bn) falls in the region 
of acceptance of Ho: then the point E2: (bi, --- , bn 3 a1, °°+ , dn) should fall in 
the region of acceptance of H,. Thatis, if Eis taken as evidence that M, > M,; 
then E, can with equal plausibility be taken as evidence that M, > M,. 

Any test such that E,:(a,,---, Gn; 61, °°: , bn) lies in wo whenever £2: 
(bi , «+ , On G1, °*~* , Qn) lies in w; and vice versa, will be designated a symmetric 
test of the hypothesis M, > M,. Let & be the class of symmetric tests of Ho. 
If 7. is a member of Q, and is uniformly more powerful than every other 7's 
which is a member of 2, then 7’, ts the uniformly most powerful symmetric test of Ho. 

The hypothesis M, > M, possesses a UMP symmetric test. This may be 
shown as follows. From (1), the ratio can be calculated between the proba- 















































SYMMETRIC TESTS 151 


bility densities at the sample points E:(4,,--+, tm; y1,°**, Yn) and E’: 
(yr, °°*, Yn; %1,°°*, In). We get 


() nO = exp (2 - iM. — My}, 


where 





a. «at 
=-dm, 9-5 dim. 


Now the condition p(Z) > p(E’) is equivalent to ~ (¢ — 9)(M.z — M,) > 0° 


Hence p(E) > p(E’) whenever ( — g) has the same sign as (M, — M,). 

Now for any symmetric test, if E lies in wo, E’ lies in w;, and vice versa. 
Suppose that, in fact, M, > M.. Consider a symmetric test, 7’. whose region 
wo contains a sub-region woy (of measure greater than zero) such that 7 < Z 
for every point in that sub-region. Then for every point E’ in wy, p(E’) < 
p(E). Hence, a more powerful test, 7, could be constructed which would be 
identical with T, , except that wiv , the sub-region symmetric to woy , would be 
interchanged with woy as a portion of the region of acceptance for Hy). There- 
fore, a test such that wo contained all points for which 7 > Z, and no others, 
would be a UMP symmetric test. This result is independent of the magnitude 
of (M, — M,) provided only M, > M,. Weconclude that 7 > is a uniformly 
most powerful symmetric test for the hypothesis M, > M.. 

The probability of committing an error with the UMP symmetric test is a 
simple function of the difference | M, — M,|. The exact value can be found 
by integrating (1) over the whole region of the sample space for which 7 < Z. 
There is no need to distinguish errors of the first and second kind, since an error of 
the first kind with 7> is an error of the second kind with 7; , and vice versa. 
The probability of an error is one half when M, = M,, and in all other cases is 
less than one half. 


4. Relation of UMP symmetric test and test which is- UMP of tests abso- 
lutely equivalent to it. Neyman and Pearson [2] have shown the test 7 — i >k 
to be UMP among the tests absolutely equivalent to it, for the hypothesis 
M, > M,. They have defined a class of tests as absolutely equivalent if, for 
each simple hypothesis in Hy, the probability of an error of the first kind is 
exactly the same for all the tests which are members of the class. If k be set 
equal to zero, 7 > Z, and their test reduces to the UMP symmetric test. What is 
the relation between these two classes of tests? 

If 7. be the UMP symmetric test, then it is clear from Section 2 that there is 
no other symmetric test, 7's , which is absolutely equivalent to T.. Hence Q, 
the class of symmetric tests, and A, the class of tests aboslutely equivalent to 
T., have only one member in common—the test 7. itself. Neyman and 
Pearson have shown 7', to be the UMP test of A, while the results of Section 4 
show 7’, to be the UMP test of Q. 











152 HERBERT A. SIMON 


5. Justification for employing a symmetric test. In introducing Section 3, a 
heuristic argument was advanced for the use of a symmetric, rather than an 
asymmetric test for the hypothesis M, > M,. This argument will now be given 
a precise interpretation in terms of probabilities. . 

Assume, not a single experiment for testing the hypothesis M, > M,, but a 
series of similar experiments. Suppose a judgment to be formed independently 
on the basis of each experiment as to the correctness of the hypothesis. Is 
there any test which, if applied to the evidence in each case, will maximize the 
probability of a correct judgment in that experiment? Such a test can be shown 
to exist, providing ane further assumption‘is made: that if any criterion be applied 
prior to the experiment to test the hypothesis M, > M,, the probability of a 
correct decision will be one half. That is, it must be assumed that there is no 
evidence which, prior to the experiment, will permit the variety with the greater 
yield to be selected with greater-than-chance frequency. 

Consider now any asymmetric test for the hypothesis H)»p—that is, any test 
which is not symmetric. The criterion 7 — ~ > k, where k > 0, is an example 
of such atest. Unlike a symmetric test, an asymmetric test may give a different 
result if applied as a test of the hypothesis Hy than if applied as a test of the 
hypothesis H,. For instance, a sample point such that 7 — Z = e, where k > 
¢ > 0, would be considered a rejection of Hp and acceptance of H, if the above 
test were applied to Hy ; but would be considered a rejection of H, and an ac- 
ceptance of Hy if the test were applied to H,. Hence, before an asymmetric 
test can be applied to a problem of dichotomous choice—a problem where Ho or 
H, must be determinately selected—a decision must be reached as to whether the 
test is to be applied to Hy or to H,. This decision cannot be based upon the 
evidence of the sample to be tested—for in this case, the complete test, which 
would of course include this preliminary decision, would be symmetric by def- 
inition. : 

Let H,. be the correct hypothesis (H> or H, , as the case may be) and let Hy 
be the hypothesis to which the asymmetric test is applied. Since by assumption 
there is no prior evidence for deciding whether H,. is Ho or H, , we may employ 
any random process for deciding whether Hy is to be identified with Ho or H, . 
If such a random selection is made, it follows that the probability that H. and 
Hy are identical is one half. 

We designate as the region of asymmetry of a test the region of points E;: 
(a,,--+ ,@n;b,,--- ,b,) and Ee:(b, --- , bn 301, --* , Gn) Of aggregate measure 
greater than zero such that E, and EF both fall in wo or both fall in w:. Suppose 
Wo, aNd wo, are a particular symmetrically disposed pair of subregions of the 
region of asymmetry, which fall in wo of a test Ty). Suppose that, for every 
point, E, , in wo, 6 > d, and that wo, and we are of measure greater than zero. 
The sum of the probabilities that the sample point will fall in wo, or wo is exactly 
the same whether H. and Hy are the same hypothesis or are contradictory 
hypotheses. In the first case H. will be accepted, in the second case H, will be 
rejected. These two cases are of equal probability, hence there is a probability 


| 































SYMMETRIC TESTS 153 


of one half of accepting or rejecting H, if the sample point falls in the region of 
asymmetry of J). But from equation (2) of Section 2 above, we see that if the 
subregions wo, and wo, had been in a region of symmetry, and if wo, had been in 
wo , the probability of accepting H, would have been greater than the probability 





, of rejecting H,. 

r Hence, if it is determined by random selection to which of a pair of hypotheses 

5 an asymmetric test is going to be applied, the probability of a correct judgment 

2 with the asymmetric test will be less than if there were substituted for it the 

1 UMP symmetric test. It may be concluded that the UMP symmetric test is to 

1 j be preferred unless there is prior evidence which permits a tentative selection of 

1 the correct hypothesis with greater-than-chance frequency. 

) 

r 6. Symmetric test when standard deviations of universes are unequal. 
Thus far, we have restricted ourselves to the case where oz = o,. Let us now 

t relax this condition and see whether a UMP symmetric test for M, > M, exists 

. t in this more general case. 

t H We now have for the ratio of p(E) to p(E’): 

e | (E n » . 

> (3) oe = exp ~so [(oy = o2) (uz — My) _ 2(o) Mz —_ o: M,)(z par a. 

e 

a where 

C ue = D7 2i/n, ay = Do yii/n. 

ir ' . 

e Even if o, and o, are known, which is not usually the case, there is no UMP 

e symmetric test for the hypothesis M, > M,. From (3), the symmetric critical 

h region which has the lowest probability of errors of the first kind for the hy- 

f- pothesis (M, = k, ; Mz = kz ; ki > ke) is the set of points E such that: 

\ (4) (05 — 02)(uz — by) — 2(oyke — osk:)( — 9) > O. 

a Since this region is not the same for all values of k; and k, such that ki > k, 

y there is no UMP symmetric region for the composite hypothesis M, > M,. 


This result holds, a fortiori when o, and co, are not known. 

If there is no UMP symmetric test for M, > M,z when o, ¥ oa. , we must be 
satisfied with a test which is UMP among some class of tests more restricted than 
L the class of symmetric tests. Let us continue to restrict outselves to the case 
re where there are an equal number of observations, in our sample, of (X) and of 
se (Y). Let us pair the observations zx; , y;, and consider the differences u; = 
ne xi — ys. Is there a UMP test among the tests which are symmetric with 
ry respect to the u,’s for the hypothesis that M, — Mz = —U > 0? By asym- 
° metric test in this case we mean a test such that whenever the point (ui; ,.--- , un) 
ly falls into region wo, the point (—wm,--- , —wun) falls into region w . 
ry If x; and y; are distributed normally about M, and M, with standard devia- 
be tions oz and o, respectively, then u; will be normally distributed about U = 
ty 











154 HERBERT A. SIMON 


M, — M, with standard deviation o, = ~/o2 +03. The ratio of probabilities 


for the sample points E,:(u,~--, Un) and Ey:(—%™m, +--+, —Ua) is given by: 
P(E.) _ —2n 

(5) o(E’) == exp { ot av}, 

where 


n 1 

t= 5). 
Hence, p(E.) > p(E.) whenever @ has the same sign as U. Therefore, by the 
same process of reasoning as in Section 2, above, we may show that a7 < Oisa 
UMP test among tests symmetric in the sample space of the w’s for the hypothe- 
sis U < 0. 

It: should be emphasized that ,, , the class of symmetric regions in the space 

of E,:(u ---+ Un), is far more restricted than Q, , the class of symmetric regions 


in the sample space of E:(a1 --- 22; y1 °*: Yn). In the latter class are included 
all regions such that: 
(A) E:(a,,--+,@n;b1, ---, ba) fallsin wo whenever E: (bi , --- , On 341, °** , Gn) 


falls in w;. Members of class Q,, satisfy this condition together with the 
further condition: ; 

(B) For all possible sets of n constants ki , --- ,kn, BE: (ai thi, +++, tn + kn; 
Yi + ki, +++, Yn + kn) falls in w whenever E:(21,---,2%n3 41, °°* , Yn) falls 
inw. When o, ~ o.,a UMP test for M, > M, with respect to the symmetric 
class Q,, exists, but a UMP test with respect to the symmetric class 2, does not 
exist. 


REFERENCES 


[1] J. Neyman and E. S. Pearson, ‘‘On the problem of the most efficient tests of statis- 
tical hypotheses,’’ Phil. Trans. Roy. Soc., Series A, 702, Vol. 231 (1933), pp. 
289-337. 

(2] J. Neyman and E. S. Pearson, “‘The testing. of statistical hypotheses in relation to 

probabilities A Priori,’’ Proc. Camb. Phil. Soc., Vol. 29 (1933), pp. 492-510. 











les 


he 
38 
1e- 


ns 
ed 


In) 


he 


eZ 
ls 
ric 
10t 


tis- 
pp. 


ON INDICES OF DISPERSION 


By Paut G. Hoe, 
University of California, Los Angeles 


1. Introduction. In biological sciences the index of dispersion for the binomial 
and Poisson distributions is very useful for testing homogeneity of certain types 
of data. For example, the dilution technique in making blood counts finds it 
useful. Recently there have been attempts to use it to determine allergies by 
observing the change in the blood count after allergic foods have been taken. 
Here the sample may consist of only a few readings; consequently it is important 
to know how accurate this index is when applied to small samples. After in- 
specting the application of the Poisson index to such counts, I was surprised to 
see the lack of agreement with theory. At first it appeared that the fault lay 
with the chi-square approximation which is used on this index, but later it was 
clear that the assumption of a basic Poisson distribution was at fault. It now 
appears that statisticians will need to be careful about citing blood counts as 
examples of data following a Poisson distribution. 

This paper is the result of investigating the accuracy of the chi-square ap- 
proximation for the distribution of these indices. Previous work on this problem 
seems to have consisted in some sampling experiments [1] for small values of 
the parameters involved, and in some theoretical work [2] in which the sampling 
distribution is considered only for a fixed sample mean. Although sampling 
distributions ordinarily differ very little from the distributions obtained by 
assuming the mean of the sample fixed, for small degrees of freedom the dif- 
ference may be appreciable and therefore requires investigation. In this paper 
the accuracy of the chi-square approximation is investigated by finding expres- 
sions for the descriptive moments of the distributian which are correct to terms 
of order N-*. These expressions are obtained by means of Fisher’s semi-in- 
variant technique. 


2. Moments of the distribution. Employing Fisher’s notation [3], let the 
binomial index of dispersion be denoted by z, then z may be written as: 


tir — z)? (N — lke N-1 ke 
lr tlh Clr Cl. hoa. ooo 
(1 - 2) b(1 - 4) a(1-#)(14+5=8 N(Q.- Ro ‘) 
n n n KL oe | 

; , N-1 
nnn - , 2 may be ex- 

a(1- +) 

n 


155 











156 PAUL G. HOEL 


panded as follows: 


ll 
o 
© 
—__ 
es 
4 
g§ 

1 — 
| 
Be 
— 
oo 
S., 
gona, 
Qi 
f+ 
Zio] 
nessa” 
4 
ee 


= bly + awy + wy + cw'y +--+}, 


where the definition of c; is obvious. As will be seen later, these expansions are 
valid for obtaining the expected values of powers of z; hence 


E(z) = bd {un + Cyn + Coa + ---} 


si E(2’) = B° {wor + Qeimre + (2ce + cf)uor + (2c3 + 2eo:)use + - °°} 
1 


E(z’) = b° {wos + 3eys + (3c2 + 3ci)uos + (3c3 + Bex: + ci)uss + ---} 


E(z*) = 0 {wos + Aim + (4e2 + 6et)uer + (4e3 + 12coer + Aci )ua + +++}. 


Since only the first four moments of z are to be found, it will be necessary to 
evaluate the ui; for 7 = 1, 2, 3, 4 and fori = 0, 1, 2, --- as far as necessary to 
give the desired degree of accuracy. 

First consider the relation between the moments y,; and the semi-invariants 
x; which are defined in terms of the yu; ; by the following formal identity in ¢ and r. 


«1 0t+«o17T 
— 


Kot 2+2n11 tr+Knqor2 
Oe te 


2 2 
a 1 + Sat nar 5 pot + Soir bee wns 


é 









Differentiating both sides with respect to ¢ and replacing the exponential factor 
by the right member gives an identity which is convenient for evaluating the 
wio. Differentiating both sides with respect to 7 and making the same replace- 
ment gives an identity which is convenient for evaluating the yu; for 7 > 0. 

These identities express yj; as a sum of products of «’s and y’s, each such 
product being of total degree 7 and j in its subscripts. By repeated substitution, 
wiz can be expressed as a sum of products of «’s only. From Fisher’s formulas 



















oo 


ieee 


ON INDICES OF DISPERSION 157 


each such semi-invariant, x,, , can be expressed as a sum of products of semi- 
invariants of the basic distribution, each term of which sum is of order N~”**~” 
in N. Hence it follows that the lowest order term, or at least one of the lowest 
order terms, in N in the expression for y,;; will be a term with the maximum 
number of x factors. Since the «,, of lowest degree in subscripts are x and ko , 
the term with the maximum number of « factors will be the term in xioxd: . 
However, since w = k; — x has a zero mean value, pio = ko = 0; consequently 
the lowest degree term involving the subscript 7 > 0 is ko or x. As a result, 
the maximum number of « factors will be found in the term containing xbixd 
for i even and «f°? «iy: for iodd. These terms are of order N* and N7?¢+” 
respectively. Since it is desired to obtain accuracy of order N~*, it therefore 
will suffice to evaluate u,; for 7 < 6. 

The validity of the expansions used in arriving at (1) could now be shown by 
writing them as partial sums with remainder terms and then showing that the 
remainder terms are of higher order than N ’. 

Neglecting terms of higher order than N *, the above identities give the follow- 
ing expressions for y;; for 7 = 0, 1, 2 and 7 = 0, 1, --- , 6, with slightly longer 
expressions for 7 = 3 and 4. 


wo = 0 . Hor = Kor 

Meo = K20 by = Ki 

M30 = K30 Mer = K21 + Ko1/420 

Mao = Kao t+ Skeop20 Msi = Kia + Skiiu20 + Koiuso 

Uso = Gxsou20 + 4k20p30 Har = Gxeiyo0 - 4kiu30 + Koista0 
Heo = Dkaop40 Bei = Sxyuao + Kotter 


He1 = Koii60 
Mo2 = Ko2 + Koon 
Miz = K12 + Kimo + Kowa 
M22 = Kee + Keasor 1 Ko2H20 + 2kiuui + Koen 
se = Ksistor + Ski2u20 + Skomar + Koomso + Skimer + Koga 
a2 = Gromer + Koopman + Akniusr + Kouta 
Mso = Skuwa + Korte 
Me2 = Koipei - 


The next step is to apply Fisher’s formulas expressing the «,, in terms of the 
semi-invariants of the basic variable distribution, which in this case is the bi- 
nomial distribution. In Fisher’s notation x-, would be written as «(1"2"), since 
the variables w and y are respectively k; , measured from its expected value, and 
kz. Applying such formulas, the following expressions for the ua and ye are 
obtained, with somewhat longer expressions for the pis and pis. 








158 


(2) 








PAUL G. HOEL 





2 
Ho. = Ke, wu = wn = 35 +5 
4 7 4 3x3 
ya = 3 + Tr Ma = a + 8 SS, 
_ 25x3 K2 _ 15x: 
Me = N3? He = We 


M22 


Ke = 


He = 


—- 42] 2 
-*4e[725 41] 


os 2k3 Ke 2 
Miz = > +308) 25 +1] 


K5 


N? 


N® 


5ks Ke 
N3 
a 16x4 Ke 


N3 


Bt 8e( ata 


N 

wf 2 Pe 2 
a2, ded. +] 
7kak 7 kg Kk 
a + oe att 


4 Man sn 4 a8 a 4 | 


+ 








- 40xs Ka 





N3 


15ks 


‘N° 


It is necessary to express these x’s in terms of the parameters of the binomial 


distribution. 


Here the «’s are defined by the following formal identity in 8, 


eritten sy 2 = enie zit 


= (¢ + pe’)”. 


Taking logarithms, expanding in powers of 6, and equating coefficients of powers 
of 6, the following expressions are obtained: 


K1 


Ke 


K3 


K4 


K5 


Ke 


K7 


Kg 


m 
mq 
mq(q — 
mq(1 — 
mq(q — 
mq(1 — 
mq(q — 
mq(1 


) 

6pq) 

p)(1 — 12pq) 

30pq + 120p'q’) 

p)(1 — 60pq + 360p*g°) 

126pq + 1680p’? — 5040p’g’). 



















ON INDICES OF DISPERSION 159 


These values of the x’s are inserted in (2) to give the following expressions for 
the wa and pir, with considerably longer expressions for the y,3 and pis : 


Hol 


Mil 


21 


31 


Msi 


Hei 


Mo2 


Mi2 


pee 


32 


M42 


M52 


He2 





mq 
1 
mq (q — p) W 
6 
ma(? ae + =) 
— 12 4 
mq(q — o(* 288 —— Pay iy 
11 — 58pq , 3mq 
6 (5 09-%) 
25 

m' qq — P) Hi 

15 

m‘ qt — Wi 
1 sabato 
1 Spq +; _ + ma) 

- ne 4m _ a) 
ma(q (= - tNwW-)h* 

1 — 30pq + 120p'q° , 8mq(1 — 5pq) 
m (=e eee + NN 1) 

mq(5 — 26pq) , _2m'g? , m’ £) 
+ tyww-pt WwW 


VWe~- £2 Bay «+ SS 
“oe Di we TNA —1) * NF 


33/36 — 176pq 6mq =) 
med (  tTmN-1)* 


40 
m‘q'(q ~ W3 


mq? 2 
= s 


It remains to express the coefficients of (1) in terms of these same parameters. 
From the definition of c;, a, and x, it follows that 


1 t+1 $+1 ° 
( — (— 1)’ (2 ) p? +(- )' - 


eS re 


a 


2 |= 


: 
a 











160 PAUL G. HOEL 


If now the above values of the u;; and c; are inserted in the expressions (1), the 
following final formulas are obtained. 


Be) =(v- {i+ 2 +(2) +(2) + +} 














2) _ _ 42 ; 2 — 21 — 6pq) pq(2 — 11pq) 
E(z’) = (N 1) \! + vi (VN — 1)Nmg —1)Nmq + ~~ (Nmq)? 
_ 2(1 + 2pq — 25p"q*) , 2pq(l + 3pq— 30p*a') -} 
(N — 1)(Nmq)? (Nmq)* 
lie all 6 _ 3pq9 8  _ 6(1 — 3pq) 
ee ee (+a Nmg + (N—1)* (VN —1)Nmq 
4. 2g — 5pq) , 4(1 — 4pq)(N - 2) _ 24(1 — 5pq) 
(Nmq)? (N — 1)*Nmq (N — 1)?Nmq 
_ 6(1 — 11pgq + 40p°¢ 4 6pq(1 — 16pq + 55p"q’) 
3) (N — 1)(Nmq)? (Nmq)* 
4 60pq(1 — 4pq)(N - 2), | 
(N — 1)?(Nmq)? i 
a 12 8pq 44 _—«:12(1 + 2pq) 
_ 2pq(2 — 21pq) * 16(1 — 4pq)(N — 2) + 48 _ 8(15 — 46pq) 
(Nmq)? (N — 1)*Nmq (N—1)* (N—1)*Nmg 
_ 12(3 — 44pq + 138p'q°) , 64pq(1 — 4pq)(N — 2) 
(N — 1)(Nmq)? (N — 1)?(Nmq)* 
i 96(1 — 4pq)(N — 2) ‘ 8(1 — 12pq + 36p’q°)(4N* — 9N + 6) 
(N — 1)*Nmq (N — 1)(Nmq)? 
4pq(1 — 43pq + 168p* q°) +} 
+ a + , 


By considering the formation of terms, it can also be shown that the above 
expressions are correct to terms of order m’*, m’, m', and m’, respectively, in the 
parameter m. If m is large these expressions are considerably more accurate 
than the order N * would indicate since the lowest order terms neglected in these 
expressions are respectively N*m‘, N‘m*, N‘m’, and N‘m. 





3. Applications. To compare these moments with those of the chi-square 
distribution, consider the ratios of corresponding moments, both for the Poisson 


distribution and for the binomial distribution in the special case of p = }. 






























Ay 


& I> 


ON INDICES OF DISPERSION 161 


For the Poisson distribution, these ratios are 


R, = 1 
1 1 
a a 
1 4 
i “ek 


7 
“= +cat + a wal: 
For the binomial distribution with p = 4, these ratios are 


1 1 
Ri =1+5- at war + Wa 





— Me ee 
=” (-! n (1 + oNn ava) 4(Nn)* 
htm =a oe 13 5 so NE in: Ni ae 
r= (1 (1 i) +a (t- 2n +323) + wig (1 2) + aaa 
_ N 7 37 «17 
Baity4{? +5 2? = yw (-3 + % a4) 


1 67 1 /31 51 17 
ei(u-2+$ S)+ Jf - &) + sata 


From these expressions the following table is constructed. 


m n N Ri R, R3 R, 

25 oo 3 1 99 .97 1.01 

25 75 3 1 1 .98 .97 
5 oo 5 1 96 .94 1.08 
5 15 5 1.01 .96 .87 .84 
2 co © 1 1 1.25 1 

2 oo 10 1 95 1.05 1.19 
2 oo 5 1 89 .85 1.21 
2 6 a 1 .83 .59 .69 
2 6 10 1.02 .87 .64 .64 
2 6 5 1.03 .90 .69 .62 
1 © 25 1 96 1.34 1.22 
1 2 10 1 89 1.10 1.39 
1 oo 5 1 .76 .70 1.44 
1 3 25 1.01 .69 dl 41 
1 3 10 1.03 .72 .39 .38 
1 3 5 1.07 77 41 .36 














162 PAUL G. HOEL 


For m > 5 these ratios are close to unity even for N as small as 5; hence it 
appears that the chi-square approximation is satisfactory as long as m > 5. 

For m < 2 most of these ratios differ considerably from unity, particularly 
for the binomial distribution. Since R, is practically constant, the reduction 
in R, here indicates that the chi-square approximation will contain too many 
extreme values. For the Poisson distribution there is an increase in R, to com- 
pensate slightly for this decrease in R2 so that the 5 percent points, for example, 
would not differ very much. The use of the chi-square approximation would 
therefore tend to give slightiy too few significant results when they exist. For 
the binomial distribution, however, there is a decrease in both R: and R,, so 
that the distribution tends toward normality; consequently the chi-square ap- 
proximation will contain far too many extreme values and the 5 percent point 
will be much too large. This situation becomes slightly worse with increasing N. 


4. Conclusions. From a consideration of the approximations for the first 
four moments of the distribution of the index of dispersion, it appears that the 
chi-square approximation is highly satisfactory provided that m > 5. For 
smaller values of m, the approximation is still fairly accurate for the Poisson 
distribution but not for the binomial distribution. For decreasing small values 
of m there is an increasing tendency to claim compatibility between data and 
theory when it does not exist; hence the binomial index must be handled care- 
fully in such situations. These general conclusions are in agreement with the 
specialized results of Cochran and Sukhatme. 

The semi-invariant technique for problems such as this is exceedingly laborious 
and is of questionable accuracy. The coefficients in Fisher’s heavier formulas 
are so large that increased accuracy comes slowly with increased accuracy of 
order of terms. In addition, there are numerous typographical mistakes in 
Fisher’s formulas, some of which are not easily detected. The formulas (3) 
may be used to investigate the accuracy of the chi-square approximation for 
situations not covered in the numerical table, but they are of questionable 
accuracy, when m is small, for N as small as 5. 


REFERENCES 


[1] P. V. Suxkunatme, “‘On the distribution of chi-square in samples of the Poisson series,” 
Jour. Roy. Stat. Soc., Vol. 101 (1938), pp. 75-79. 

[2] W. G. Cocuran, ‘‘The chi-square-distribution for the binomial and Poisson series with 
small expectations,” Annals of Eugenics, Vol. 7 (1936), pp. 207-17. 

[3] R. A. Fisner, ‘Moments and product moments of sampling distributions,’”’ Proc. Lon- 

don Math. Soc., Series 2, Vol. 30 (1930), pp. 199-238. 


} 





st 


or 
yn 
eS 


id 


e- 


1s 
AS 
of 
in 
3) 
or 
le 


ON SERIAL NUMBERS 


By E. J. GuMBEL 
New School for Social Research 


In this paper we consider a continuous variate and unclassified observations. 
It is well known that there are two step functions, which we may trace for a given 
series of observations. We will show that the differences between the two ways 
of plotting play an important réle for certain graphical methods used by en- 
gineers. 

To obtain one and only one series of observations we adjust the cumulative 
frequencies. The corrections thus introduced depend upon the theoretical dis- 
tribution which is adequate for the observations. Later we deal with the rela- 
tion between serial numbers and grades. Finally we construct confidence bands 
for the comparison between theory and observations. 


1. Theory and observations. If we arrange n observations in order of in- 
creasing magnitude, and write each as often as it occurs, there will be a first, 
x, the smallest value, a second, 22, an mth, z,, the penultimate, z,_,;, and the 
last, 2, , 1.e., the greatest value. The index m is called the observed cumulative 
frequency, or simply the rank. It is usual to draw the observations z,, along the 
abscissa, and the rank m along the ordinate. The step function starts with a 
vertical line from the value 2x, of the abscissa to the point with the coordinates 
1, x, and in general consists of the horizontal lines from the point m, xz», to the 
point m, 2m4, and the vertical lines from the point m, %n41 to the point m + 1, 
Im4i1. The step function ends with the point n, z,. We call this graph the 
step function (m, xm). However, another step function which is derived from 
the observations arranged in decreasing magnitude is equally legitimate. This 
step function starts from the point with the coordinates 0, 2, , and in general 
consists of the horizontal lines from the point m — 1, 2m to m — 1, %m4; and the 
vertical lines from the point m — 1, m4; to the point m, x,,4: and ends with the 
point n — 1,2,. We call it the step function (m — 1, zm). Let F(x) be the 
probability of a value equal to or less than x. Then the continuous theoretical 
curve, the ogive, which we compare to the step functions is nF(x), x. The ques- 
tion is whether we have to use the step function (m, z,,) or the step function 
(m — 1, 2m). 

The differences between the two ways of plotting are rarely mentioned in the 
statistical literature. If we plot instead of the rank m the reduced rank m/n, 
the differences between the two ways of plotting are of the order 1/n. It is 
generally tacitly assumed that this difference may be neglected. This will not 
hold if n is small. 

In the following we show two other ways of plotting the observations where 
the differences between the two observed curves play an important role. Sup- 


163 











164 E. J. GUMBEL 


pose that the probability F(x) and the density of probability, f(z), henceforth 
called the distribution are such that it is possible to introduce a reduced variate 


z-a 
(1) = “~—s 
which has no dimension. In general, the constant a will be a certain mean, and 
the constant b a certain measure of dispersion. Furthermore, the constants may 
be linear functions of these characteristics. Neither the probability G(z) of a 
value equal to or less than z 


(2) G(z) = F(z), 
nor the reduced distribution 
(3) g(z) = bf(a + bz) 


contain constants. The equiprobability test consists in the following procedure: 
We attribute to the mth observation z,, the relative frequency m/n, and deter- 
mine from a probability table a value z, such that 


(4) G(z) = m/n. 


The variate x is plotted on the ordinate, and the reduced variate z on the ab- 
scissa. Then the points z,, z must be situated close to the straight line (1). 
To apply this comparison between theory and observations, we need not even 
calculate the constants. For the normal distribution the application of this test 
is greatly facilitated by the use of probability paper. 

The difficulty is that we may as well choose the frequency 


(4’) G(z) = (m — 1)/n, 


and determine the corresponding values of z. Therefore, we have two lines (1) 
instead of one. The difference between the two series will be large for the 
first and last few observations. For the first series the last observation cannot 
be plotted on probability paper; for the second series the first observation can- 
not be plotted. 

The same difficulty exists for the “return period.” If the observations of a 
continuous variate are made at regular intervals in time which are taken as units, 
we may as in [4] define the theoretical return periods T(x) of a value equal to or 
greater than z as 


(5) Te) = FG: 


The comparison of the theoretical with the observed return periods gives a test 
for the validity of a theory. However, there are two series of observations, 
namely, the exceedance intervals 





















s< & 


r= 


eo T 


t 


LT A A 


ON SERIAL NUMBERS 165 





t n 
(6) ae OF oer 5 m=1,2---n—1 
and the recurrence intervals 

” = I a > = eee 
(7) T@m) = pit MT Ase. 


The two expressions (6) and (7) differ widely for the high ranks. The penulti- 
mate observation, for example, has an exceedance interval n, whereas the recur- 
rence interval is only.n/2. This contradiction and the difficulty arising for the 
equiprobability test show that the question of choosing the observed cumulative 
frequency of the mth observation has a practical significance. 

The equiprobability test and the comparison between the observed and the 
theoretical return period may be combined on probability paper. The variate xz 
is plotted on the ordinate, the reduced variate y on the abscissa. But instead of 
y we write the probability F(x) and the return period T(x). If the theory holds, 
the observations must be scattered around the straight line (1). 

But all these methods presuppose that we know whether we have to attribute 
to x» the rank m or the rank m — 1. Sometimes a compromise has been pro- 
posed which consists in attributing to z,, neither m nor m — 1, but the arithmetic 
mean of both, m — 3. In other words, the index m is no longer considered to be 
an integer. In such cases, we call m the serial number. ° 

The corrected frequency m — } may be accepted for the comparison between 
the step function and the probability curve. However, for the return period 
and for the equiprobability test this method leads to serious difficulties. The 
corrected return periods, which have been proposed by Hazen [7] and have been 
used by M. Kimball [8] are 


n 

(6) T(im) = n—-m+1/2 . 

The last among n observations has a return period 2n. This idea does not seem 
to be sound. No statistical device can increase the number of observations 
beyond n. 

2. The adjusted frequency of the mth observation. The use of m, m — 1, or 
m — 4 as frequency of the mth observations amounts to considering the mth 
value as being fixed. To obtain one and only one step function we consider 7», 
as a statistical variate. This will lead to the determination of the most probable 
serial number and of the corresponding probability as a function of m and n. 

The mth observation is such that there are m — 1 observations below it and 
nm — m observations above it. Consequently, the distribution w,(z,m) of the 
mth observation is 


(9) wa(z,m) = (") mF (x) — F@)"¥@). 








166 E. J. GUMBEL 


The variate z,, is simply called x as each value of x has a certain density of 
probability of being the mth. To distinguish between (x) and w,(z,m), the first 
distribution is referred to as the initial distribution. For some simple initial 
distributions it is possible to calculate exactly the mean and the standard error 
of the distribution (9). This has been done by Karl Pearson [10] for the normal, 
the uniform, the exponential, and other skew distributions. The results are very 
complicated, and do not allow any immediate practical applications. 

In the following we determine therefore instead of the mean the mode of the 
mth value. The most probable mth value for which we write Z» is the solution 
of 


d log w,(x,m) » 


dz 0. 
We obtain from (9) 
m—-1,.,  8- o.,_ Sta) 
(10) Fay - pay" - ~Fe5- 


In this equation m is counted in order of increasing magnitude. If we choose 
the inverse order we obtain the same equation, if we replace the index m by 
n—m-+ 1. Therefore the following results are independent of the order of 
counting m. 

Equation (10) gives the most probable value Z, as a function of m and n. 
The function depends upon the distribution. 

A rough, first trial solution of (10) may be obtained if we confine our interest 
to values where neither m nor n — m is small in comparison to n, that is, values 
which are not extreme. Suppose m to be of the order n/2. For increasing num- 
bers of observations, the expression on the left side of (10) become large com- 
pared to the right side provided the derivative remains finite, as is generally the 
case. If we neglect the right-hand member, Z,, is the solution of 


ae... el 
(11) F(En) = 2 e. 





This expression holds for the uniform distribution where f’(z) = 0. 

The following exact solution is valid for any number of observations and any 
serial number. Equation (10) will be used in two different ways: First, we sup- 
pose m to be known, we determiné the probability F(Z,.) of the most probable 
mth value as a function of m and n, and attribute this probability to the mth 
observation z,. Py doing so, the probability of the most probable mth value 
becomes the adjusted frequency of the mth observation. This leads to one and 
only one series of observations, and settles our initial question. Later, in 
section 3, we suppose F(z,,) to be known, and compute the corresponding most 
probable mth observation. This leads to an estimate of the grades (or partition 
values) from the serial numbers. 











°- —_—= cor ™ 


——— 


ON SERIAL NUMBERS 167 


To obtain F(Z,,) from (10) we introduce an expression o°(r,) by stating 
(12) [o°(am)n] = F(Sm)[1 — F(am)]f- (Em) « 


The brackets are meant to indicate that the product on the left side does not 
depend upon n. We shall show later that o°(z,,) is under certain conditions 
the variance of the mth observation. For the present purpose however this 
significance is not required. Multiplication of (10) by (12) leads to 


(13) m — 1+ F(Em) — nF (Em) = —f'(Em)[o°(em)n}, 
or 

~\ _m—1 , f'(%m)[o"(rm)n] 
(14) F (Zm) oe 


The adjusted frequency in (14) is similar to (11). Another expression for the 

adjusted frequency, derived from (13), is 

m— 4} 
n 





(15) F(&m) = + = (Fm) — +S’ En)l0*(tm)nl). 


aa 
The adjusted frequency is the compromise = z 





plus an expression 


(16) @ = + (En) — 4 + F'En)lo*(tn)n) 


The correction, D, defined by (16) depends upon the initial distribution and has 
no dimension. In general, it will depend upon the constants which exist in the 
distribution. If the distribution f(z) may be written in a reduced form (3), 
the correction’ 


(17) D = Gz) — 4 + g’(@)lo"(e)n] 


depends only upon the dimensionless reduced variate z. For a given initial dis- 
tribut#on we choose numerical values for the probability G(z) = F(Zm) calculate 
g'(z) and 
G(z)(1 — Gz) 

g°(z) 
From (16) we compute a table for the corrections D as a function of the adjusted 


frequencies F(%,,) and obtain for given n the serial number m as a function of 
the adjusted frequencies by 


(19) m = nF(Zm) + $ — D. 


These serial numbers will not be integers. The adjusted frequency F(Z,,) for 
the mth observation will then be obtained by linear interpolation. 


(18) [o*(z)n] = 


1In previous articles [3, 6] we started from another interpretation of the corrected 
frequencies and obtained slightly different corrections. 























168 E. J. GUMBEL 


The value and the sign of the correction D depends upon the distribution. 
For the asymmetrical exponential distribution, for example, the correction 


(19’) D = -}, 


is independent of the variate. This means that we have to use exclusively the 
step function (m — 1, 2») as being the best way of plotting. The observed 
adjusted return periods are the recurrence intervals. 

For a symmetrical reduced distribution we have 


(20) 1— G(-z)=Gz); —_ g(-z) = gz); g’(—z) = —g’(2). 

Therefore, the reduced correction will be 

(21) D(—z) = —D¢(e). 

For the two reduced values z and —z of a symmetrical variate the corrections 
' 


have the same size, but different signs. 
A relation similar to (21) holds for two asymmetrical reduced distributions 
gi(z) and ,g(z), which are symmetrical one to another in the sense 


(22) G2) =1-sG(-2); — ailz) = (—2);_ ile) = —s/(—2). 
Then, the corrections are 
(23) - Di(—z) = —WD¢). 

For any initial distribution f(z) we read from (19) the adjusted frequency 


(24) Ma ~ CI8S*, 

n 
even for a small number of observations. The question whether to choose m/n 
or (m — 1)/n as observed cumulative frequency is settled by (24). We obtain 
one observed step function, one series for the equiprobability test, and one 
series of observed return periods 


- — n 
(25) ae * sero 













which have to be compared to the theoretical continuous curves. 


3. Estimates for the grades. In the following we use the fundamental 
formula (15) to determine interesting grades through the mth values. 

We use. the term grade for the value of a statistical variate which corresponds 
to a given cumulative probability F(x) say, F(x) = 4; 4; 2 for quartiles; F(z) = 
ts, -** 2 for deciles, and so on. For a given grade, the probability F(x) the 
density of probability f(z) and its derivative are known, and m is unknown. 
The value of m obtained from (15), henceforth called the most probable serial 
number 7, is the solution of 





1e 
od 


ns 


ns 


/n 


ne 


tal 
ids 
the 


‘jal 





f 


ON SERIAL NUMBERS 169 


(26) m = nF(x) + 1 — F(z) — f'(z)F(a)(1 — F(z))f"(@). 


The corresponding “observed” value za is obtained by interpolation between 
two observed values %m_; and %m, such that 


m—-1l1<m<m. 
For the median, 2) , the most probable serial number 7% is 


~ ntl __ f'(%) 
(27) mo = 2 4f?(2») ° 








' The median 2 itself enters into (27). It has to be eliminated through the condi- 


tion F(z.) = 3. For the exponential distribution for example we find 
(27') itty = 5+ 1. 


The most probable serial number of the median for a symmetrical distribution is 
(28) imo = 4(n + 1). 


This is the usual estimate of the median for any distribution. The estimate 
obtained from (27) is smaller (larger) than the usual estimate if the median is 
smaller (larger) than the mode. The difference between the two estimates is 
due to the fact, that (27) makes use of information about the theoretical distribu- 
tion whereas this information (if available) is neglected by the usual method. 

For symmetrical distributions the most probable serial numbers m, and fiz 
for two symmetrical grades defined by F; and F; = 1 — F; are according to 
(26) related by 


®m = nFi +1 — (Fi + fiFidl — Fifi’) 
®. = n(1 — Fi) + (Fi + fiFi(l — Fifi’). 


The members in brackets have the same size, but opposite signs. Another ex- 
pression for fiz is 


fie = (n+ 1) — [nFi +1 — Fi —fiFil — Fofi’] 
so that, for symmetrical distributions 
(30) m+ im=n+i. 


This is to be expected as the mth value counted upwards is the (n — m + 1)st 
value counted downwards. 

For the two quartiles gq: and g: the most probable serial numbers 7i(q,) and 
m(q2), obtained from (29) are 


1) aq att? 3f@), ge. Bt! 2f@ 


(29) 





where gi and gq: have to be eliminated by the use of 
































170 E. J. GUMBEL . 


F(q) = 4; F@) = i. 


For the uniform, the normal and the exponential distribution we obtain the two 
quartiles from 








mg) ="F3 5 mq) = Ft 
(31’) mq) =" + 352; Hi(m) = 3 + 048 
mg) =F+1 3 Ag) = 241 


respectively. The last result may also be found from (19’) and (24). These 
estimates differ from the usual estimates by the reason given above. 

We now apply the notion of a grade to certain characteristics which are other- 
wise defined. A certain characteristic, say, the mode Z or the mean Z have for a 
given distribution the probabilities F(Z) or F(Z) respectively. These probabili- 
ties may be used to define a grade. We determine the corresponding mth value 
from (26), and obtain an estimate of the mode or the mean, interpreted as grades 
by interpolation between the observed mth values. For a symmetrical dis- 
tribution these estimates of the mode and mean are identical with the estimates 
of the median. For an asymmetrical distribution, the most probable serial 
number 7(Z%) of the mode becomes according to (26) 


(32) m(z) = (n — 1)F(%) + 1. 


Usually, the mode Z of a continuous variate is estimated by another procedure. 
The observations are arranged in certain cells. One of them has the largest 
relative frequency. It will contain the mode. To find its position within the 
cell, an interpolation formula is applied which reproduces the content of this 
cell and of the two adjacent cells. By choosing different lengths for the cells 
and different origins for the classification, the mode can be shifted to the right or 
to the left. Formula (32) furnishes a determination of the mode from the ob- 
servations according to the theory, such that the arrangement of the observa- 
tions into different cells is not needed. Of course, this method can be applied 
only if we know the theoretical distribution f(z). The problem how to estimate 
the mode is important for distributions where one of the constants may be in- 
terpreted as the mode or as a function of the mode [1, 4]. 


4. Standard errors of the estimates. The numerical work involved in the 
method (26) of estimating the grades is very small. To obtain the standard 
errors of these estimates we consider the asymptotic properties of the distribu- 
tion (9). The following results hold therefore only for large numbers of observa- 
tion. Besides we assume, that the serial number m is of the order n/2, i.e. not 
extreme. It has been shown [2] that under these conditions the distribution 





; 
£ 








vO 


er- 
ra 
li- 
ue 
les 


Les 
ial 


ost 
he 
his 
lis 


he 
rd 


yu- 


ot 
on 


ON SERIAL NUMBERS 171 


of the mth value converges toward a normal distribution with a standard error 
o (tel, where 


- 
f(z) 
Although this standard error does not contain m explicitly, it has a clear meaning 
for any value of x as we know from (26), which observation we have to attribute 
to the probability F(z). The classical proof about the approximate normality 


of the distribution of the median in large samples is a special case of this con- 
vergence and the classical standard error of the median, 


(33) o(tm)Vn = VF(a)(1 — F@)). 


1 
(34) o(x0)Vn = Hla)’ 
is a special case of (33). The square root in (33) is maximum for F(x) = }. 
Therefore, 


1 
(35) o(lm)Vn ~ (a) . 


If the variate x may be reduced through the linear transformation (1) the 
standard error o(z) of the reduced variate, called reduced standard error 


(36) [o(2)/n] = iw VG@a — G@): 


may be calculated as a function of z where z corresponds to z,. To call at- 
tention to the fact that these numerical values do not depend upon n, they are 
written in brackets. The standard error of thé estimate for zr» is, according to 
(2) and (3) 

b 
Vn 
Since the constant b is a measure of dispersion, the standard error of the estimate 
of the mth value is proportional to the standard deviation of the variate. The 
factors b and 1/+/n show that the standard error of the mth value is of the same 
structure as the standard error of the arithmetic mean. 

For symmetrical distributions the standard error (33) of the mth value is also 
a symmetrical function. The standard errors of the estimate of the two quar- 
tiles, and generally of the estimates of two grades defined by F and 1 — F, are 
then identical. If the mode coincides with the median, the corresponding stand- 
ard error of the mth value isa minimum. For asymmetrical U-shaped distribu- 
tion, however, where the density of probability passes through a minimum at 
the center of symmetry, the median has the largest standard error among the 
mth values. An example for such a distribution has been given by Leavens [9]. 
As the distribution of the mth value converges towards a normal distribution, 
it is legitimate to attribute to the mode of the mth value the standard error (33). 


(37) o(m) = [o(z)-Vnl. 














172 E. J. GUMBEL 


Therefore, for a large number of observations (33) gives the standard error of 
our estimate of the grades. The standard errors of the estimates (31) of the 
quartiles are 


(38) tenfi @ v3 


v3. V3 
4f(qi)’ 4f (qe) ” 

The arithmetic mean in its usual definition is not an mth value. Its standard 
error o(Z), where 


(39) o(Z)V/n = 6," 

will, therefore, fall outside of the range of the standard errors of the mth values. 
(See graph 1.) If the distribution f(x) is such that the standard deviation 
does not exist, it is legitimate to estimate the arithmetic mean as a grade, and 
calculate it from the corresponding most probable mth value by introducing 


F(z), f(z) and f’(Z) into (26). The standard error of the arithmetic mean inter- 
preted as a grade is 


(40) o(Z)Vn = 


o(q2)V/n = 


f(@) 
If’we use this estimate of the arithmetic mean for distributions where o exists, 


the usual determination of the mean will be more (less) precise than its estimate 
as a grade if 


/ F(z)(1 — F(%). 


(41) of(é) S$ VF(@)0 — F(@)). 
The standard error of the mode estimated as a grade is 
(42) o()V/n = .. VF@U — FO)- 


I® 
As the standard error of any characteristic depends upon the way it is estimated 


from the observations, the standard errors of the mode or mean interpreted as 
grades differ from the usual standard errors. 


5. The most precise grade. Equation (33) may be used to define a new grade 
which has interesting properties. The standard error (33) of the estimate of the 
mth value is a function of F. We ask whether it possesses a minimum (maxi- 
mum). The corresponding value of the variate, ¢, may be called the most 
do(2m) 


(least) precise mth value or the most (least) precise grade. To obtain aF it is 





sufficient to calculate from (33) 


nd log o” (tm) _ 2no’ (Xm) 





dz o(Zm) 
Therefore the most (least) precise grade is the solution of 
7 
(43) fz) _ _f@ _%@_9 


F(t) 1-—F(z) f(@) 





| 

















































- of 
the 


ard 


1es. 
‘ion 
and 
‘ing 
ter- 


sts, 
ate 


ted 
l as 


ade 
the 
AXxi- 
nost 


t is 


ON SERIAL NUMBERS 173 


This expression does not vanish if either F(x) = 3 or f’(r) = 0. It vanishes if 
both conditions hold simultaneously. For a symmetrical distribution passing 
through a mode (minimum), the mode (minimum), estimated as a grade, is the 
most (least) precise grade. Equation (43) may be written 


f'(x)f"(@)F(a)(l — F(@)) = 3 — Fe). 


If we introduce this expression into (16), we obtain D = 0 
and 





(44) F(#) = ™— 3, 


The most precise mth value is such that the adjusted frequency is the arithmetic mean 
of the frequencies m/n and (m — 1)/n. 

The most precise mth value ¢ cannot be calculated from the observations 
alone. It may be estimated in the same way as any grade by introducing the 
values F(z), f(z) and f’(Z) into equation (26). 

To show the difference between the most precise grade and the mode we apply 
the procedure developed above to a skew distribution. The reduced distribu- 
tion of the largest value g(y) and the probability G(y) are 


(45) g(y) =e "Gly); G(y) = 


The relation (1) between the reduced variate for which we write y instead of z 
and the largest value z is 


(46) xr=ut c.. 
Qa 
where u = Z is the mode and 
(47) i v6, 
a T 
The most probable serial number 7(u) of the mode, obtained from (32) is 
(48) m(u) = oven! 


This equation may be used for an estimate of the constant u. 

The reduced variance o(y) obtained from (36) and (45) is 
(49) (o*(y)Vn) = ee" * — 1). 
A table for the reduced standard error o(y)+/n has been given in a previous 
publication [6]. The value o(y)+/n is plotted in figure 1 for probabilities G(y) 
from 0.01 to 0.95. The standard error has a minimum for a value of y located 
to the left of the mode 7 = 0. On the same graph are plotted the reduced 
standard errors for the normal distribution. As the normal reduced variate z 
differs from the reduced variate y, two different scales are used for the variates. 
The standard error of the estimate (48) of the mode interpreted as a grade, 
obtained by introducing y = 0 into (49) is 











174 E. J. GUMBEL 


(49’) o(u)/a = Jem i = 1.022050. 


The most precise grade is 


J -2 -/ ff 2 3 
Reduced Variable Y 






3.5 


3.0 






Standard Error of the 

Most Probable m® Value 
Mode 
Median 





2.5 


Kis «eS 
+ ®@ x > 


15 


Reduced Standard Error (61%) 


40 





« D 
Reduced Normal Variable Z ge 
1.5 5 Oo 


-4.0 = oJ 4.0 4.5 


Fic. 1. The Reduced Standard Error of the mth Value 





where y is the value of the reduced variate, for which the standard error (49) 
is minimum. We obtain from (49) and (45) the numerical values 


(50) y = —.46601; G(g) = .20319; o(é)\/n = .96887c. 


The standard error of the most precise grade is 3 per cent smaller; the standard 
error of the mode, estimated as a grade, is 2 per cent larger than the standard 
error of the mean. 












49) 












- 


ON SERIAL NUMBERS 175 


6. Confidence bands. The standard errors (33) of the grades may be used in 


’ ageneral way for the construction of confidence bands obtained from curves which 


control the fit between theory and observation. Consider first the observed 
stepfunction (m — 4, x») and the theoretical ogive nF (x), x. The variate x is 
plotted along the abscissa, the cumulative frequency along the ordinate. Now, 
for large n any theoretical value z, which is not extreme, may be interpreted as 
an mth value having a normal distribution and a standard error o(z,,). At each 
point of the graph of nF (x), x which is not extreme, we construct a segment of 
length 2c(z,,) parallel to the x axis, the midpoint of the segment being on the 
theoretical ogive. In other words, we add the standard error o(z,,) to, and sub- 
tract it from, any corresponding value z, and attribute nF (x) to the beginning 
and end of these intervals. By this procedure we obtain two curves nF(z), 
x + o(xm). For each observation there exists a probability P = .68268 that it 
will be contained within the interval x * o(rm). 

If we apply another hypothesis to the same observations, or choose other 
values for the constants, we reach, of course, other control curves. Of two com- 
peting hypotheses the one is to be preferred where the band contains a larger 
number of observations. 

The same method may be applied to the equiprobability test and to the com- 
parison of the observed and theoretical return periods [6]. This procedure is 
legitimate for all values which are not extreme. 

In the following, we construct the confidence bands for the normal distribution 

_ 
(51) g(z) = wee 
The variate x is related to the reduced variate z by (1), which, in this case, be- 
comes 


(52) a= #+ 0vV2z. 
The probability G(z) is 
(53) — -G@) = 3(1 + &@)), 
where #(z) stands for the Gaussian integral 
(54) &(z) = Se | ota. 
Vx Jo 
Formulae (36) and (53) lead to the reduced standard error 
(55) e@)Vin = 5 VI 0), 


given in the table, col. 6. The standard errors o(z,,) of the mth values obtained 
from (37) (52) and (55) are 


(56) tad ~ TS eave. 





Vn 











176 E. J. GUMBEL 


As a numerical example’ we choose the annual precipitations observed in 51 
meteorological stations in Paris and its surroundings in the year 1938. We 
suppose that the differences between the 51 observations are only due to chance. 


The stepfunction m — 3, x» is plotted in figure 2. To obtain the theoretical 
ogive we compute the constants in (52). They are 
(57) Z = 571.92; 0/2 = 38.52. 


The theoretical values x obtained from (52), the cumulative frequencies nF (z) 
obtained from the table of the Gaussian integral [11] and the standard errors 


NF (x) 


Observations r 


Theory wa 
ControlCurves —~ 


“ 
8 
~~ 
3 
w 

9 

\ 
t 

S 
< 


Annual Rainfall in Millimeters 





500 550 600 650 
Fic. 2. The Confidence Band 
(58) o(2m) = 5.393 [o(z) Vn], 


obtained from (56) are given in the columns 2 to 5 and 7 of Table I. 

We trace in figure 2 the theoretical curve nF (x), x and the confidence band 
obtained from col. 7. by the methods described above. All observations are 
contained within the control curves. We may accept the theory that the differ- 
ences between the annual rainfalls observed in the 51 stations are only due to 
chance. 


7. Conclusions. To test a statistical hypothesis for a continuous variate we 
use the ogive, the equiprobability method, based on (1), and the return periods 


















r 
1e 


al 


ON SERIAL NUMBERS 177 


(5). The three tests may be combined on appropriate probability paper. As 
the rank of the mth observation z,, may be m or m — 1, we have two series of 
observations. To obtain one and only one series we use for the ogive the serial 
number m — 3 provided that the number of observations is large. Generally, 
we attribute to z,, an adjusted frequency, namely, the probability (15) of the 
most probable mth value. The adjusted frequency is obtained from the serial 
number m — 3 and a correction, D, equation (17), which depends upon the dis- 
tribution. The correction is important for the three tests, and small n, further- 
more, for the equiprobability test and the return periods for the extreme observa- 
tions and any number n. 

The same correction D is used for estimating a grade through its relation (26) 
to the corresponding most probable serial number m. For distributions, where 
the second moment does not exist, we estimate the arithmetic mean from a 


TABLE I 
Normal Confidence Band and Theoretical Frequencies of the Rainfalls 
l 


. | 

Reduced Variate Frequency Reduced 

Variate casera tema | Standard Error | —— ~~ 
i m 























i z : StF (2) | « SLE () | ¢ (3) Vn ; 
0 571.91 | 571.9 | 25.50 | 25.50 | — .886 4.8 
2 | 564.2 | 579.6 | 19.82 | 31.18 |  .899 4.9 
A 556.5 | 587.3 | 14.58 | 36.42 | .940 5.1 
6 548.8 | 595.0 | 10.10 | 40.90 | 1.012 5.5 
8 541.0 | 602.7 | 6.58 | 44.42 | 1.127 6.1 

1.0 | 533.4 | 610.4 | 4.01 | 46.99 | 1.297 7.0 

1.2 | 525.7 | 618.1 | 2.29 | 48.71 

1.4 | 418.0 | 625.9 | 1.22 | 49.78 | 

1.6 | 510.3 | 633.6 60 | 50.40 

1.8 | 502.6 | 641.3 | .28 | 50.72 





grade. For asymmetrical distributions we estimate the mode from a grade 
by (32) and (48). 

In this case, we have to introduce a distinction between the mode and the most 
precise grade (43). The adjusted frequency and the estimates for grades may 
be used even for small numbers of observations. 

The standard error of these estimates is obtained, equation (33) from the 
limiting, normal, form of the distribution of the mth value, which holds, provided 
the serial number is not extreme. To control a hypothesis we construct con- 
fidence bands, which are obtained from the standard errors of the grades. 


REFERENCES 


[1] R. A. Fisher ano L. H. C. Tippett, ‘‘Limiting forms of the frequency distribution of 
the largest or smallest member of a sample,” Proc. Camb. Phil. Soc., Vol. 24, 
part 2 (1928), p. 180. 


OE —— 


| 
| 
| 














178 E. J. GUMBEL 


[2] E. J. GumBEL, ‘‘Les valeurs extrémes des distribution statistiques,’’ Annales de l’Insti- 
tut Henri Poincaré, Vol. 4 (1935), Paris, p. 115. 

[3] E. J. GumBEL, ‘‘Les valeurs de position d’une variable aléatoire,’’ Comptes Rendus, 
Vol. 208, (1939), Paris, p. 149. 

[4] E. J. GoumBet, “The return period of flood flows,” Annals of Math. Stat., Vol. 12 (1942), 
p. 163. 

[5] E. J. GuMBEL, ‘‘Simple tests for given hypotheses,’’ Biometrika, Vol. 32 (1942), p. 317. 

[6] E. J. GumBeEt, ‘“‘Statistical control curves for flood discharge,’’ Trans. Am. Geoph. 
Union (1942), Washington, p. 489. 

[7] ALLEN Hazen, Flood Flows, New York, John Wiley, 1930. 

[8] B. F. Kimsatu, ‘‘Limited type of primary probability distribution applied to annual 
flood flows,’’ Annals of Math. Stat., Vol. 13 (1942), p. 318. 

[9] Dixon H. LEaveEns, ‘‘Frequency distributions corresponding to time series,’’ Jour. 
Amer. Stat. Assoc., Vol. 26 (1931), p. 407. 

[10] Karu anD MarGaRet V. Pearson, ‘‘On the mean character and variance of a ranked 
individual, and on the mean and variance of the interval between ranked in- 
dividuals,’”’ Biometrika, Vol. 23, part 3, 4 (1931), p. 364; Vol. 24, part 1, 2 (1932), 
p. 203. 

[11] Tables of Probability Functions, Federal Works Agency W.P.A. of New York City, 1941. 








34 - 
us, 
2), 
17. 
ph. 
ual 
ur. 
ced 
in- 
2) 


41. 





FITTING GENERAL GRAM-CHARLIER SERIES 


By Paut A. SAMUELSON 


Massachusetts Institute of Technology 


1. Introduction. Since the last part of the nineteenth century at least, it has 
been common to represent a probability distribution by means of a linear sum 
of terms consisting of a parent function and its successive derivatives. Usually 
the parent function is the Type A or normal curve, as discussed by Gram [1], 
Bruns [2], Charlier [3], and numerous others. In addition there have been 
generalizations in various directions: for example, the Type B expansion in terms 
of the Poisson parent function and its successive finite differences. 

Unlike these two types, which have a definite probability interpretation, 
another generalization involves the use of other parent functions and their 
derivatives (or differences) to give an approximate representation of a given 
frequency curve. With this process is associdted the names of Charlier, Carver 
[4], Roa [5], and many others. Two general methods by which the equating of 
moments of the fitted curve and the given distribution yield the appropriate 
coefficients have been given by Charlier and Carver respectively. An account 
of the latter’s technique is more accessible to the average English speaking 
statistician. 

It is the purpose of the present discussion to indicate how the Charlier method 
may be simplified, and can be used to replace the Carver method. In doing 
so, I am following up the oral suggestion made some years ago by Professor 
E. B. Wilson of Harvard, that repeated integration by parts will yield the req- 
uisite coefficients very simply. At the same time certain methods implicit in 
the work of Dr. A. C. Aitken [6] show how the use of a moment generating 
function can often lighten the algebraic analysis. There will also be a brief 
indication of analogous results for general finite difference parent families; and 
attention will be called to a troublesome historical blunder which has per- 
meated the statistical literature. 


2. Alternative methods. Avoiding the overburdened ‘expression generating 
function, I shall consider parent functions, called f(x), with the restrictive 
properties: 

a) Moments of all order of f(x) exist. 

b) Derivatives of any required order exist with appropriate continuity. 

c) There exist high order contact at the extremities of the distribution as 

defined below. 
Mathematically, 


a) [ a'f(x)dz is finite for all positive integral values of k 


and 
c) lim 2z’f*(z) = 0 for all positive integral values of j and k. 
z—to 


179 

















180 PAUL A. SAMUELSON 


These conditions suffice for many statistically interesting cases, but where de- 
sirable they can be lightened. Thus, derivatives may only be defined “almost 
everywhere,” and there may be finite instead of infinite limits to the distribu- 
tion, ete. 

Given an arbitrary frequency curve F(x), we shall suppose it to be formally 
expanded in the series 


(1) F(x) ~ aof(x) + arf’(x) + anf(x) + +++ + anf"(e) +++. 


For convenience in what follows, we shaH assume that all distributions are given 
in terms of relative frequency so that the area under both f and F is equal to 
unity, so that a@) may be taken as unity. The suppressed absolute frequency 
can clearly be restored at any time by multiplication of both sides with the 
appropriate constant. Also for algebraic convenience, many writers consider 
the slightly modified form of the expansion 


F(x) ~ Aosta) — ya) + Spa) + CVA pe to. 


It is assumed without discussion that the first n coefficients in such a series are 
to be determined by equating the first n moments of each side. 
I shall prove the two following identities: 


(2) (10a = LaF) — SS Le ((- 0a, 
where 
. [ a’f'(x)dx 
L,(f') — a . 





Alternatively 


n n d 1 2 sod 
om A. = > (*)< or ~ f x" F(z) de. 
fe dx a 


The first of these which I owe to Prof. Wilson is implicit in Charlier’s work. 
The second which may fairly be attributed to Aitken may reduce the actual 
work in many special cases met in practice. 

Both of these methods are closely related to the Charlier device of finding 
polynomials S,(x) with the bi-orthogonal property 


r S,(z)f'(x) dx = 0, ~En. 


The subscript indicates the degree of the polynomial. By means of n of the 
above relationships, the polynomials can be determined except for a factor of 





GRAM-CHARLIER SERIES 181 


proportionality. By formal integration of both sides of our expansion we have 
the Charlier identity 


a, = [ S,(x)F (x) dx/factor of proportionality. 


From a theoretical standpoint, this method leaves little to be desired; but in 
practice the algebraic work increases rapidly with the number of terms to be 
included in the series. 

In the Carver method, the new parent function in question, as well as the 
function to be approximated, are both expanded in terms of the normal curve, 
thus almost doubling the numerical calculations. After some differentiation, 
the members of the Type A family are eliminated yielding in the process the 
required coefficients in terms of the new parent family. We shall see below how 
this method may be related to the three above. 


3. Useful relationship. First, two simple identities may be presented: 
Lf) = (-1)'Liilf), G24 
= 0 » 446 


Given the above assumptions of high contact, this follows immediately from 
repeated integration by parts. 

Remembcring that the reduced moments defined just above are the coefficients 
of the powers of a in the series expansion of the moment generating function 


Mla; f) = [ef @) de = Lif) + Life + Lala + + 


we have the useful Aitken identity 
(4) M(aq; f') = (—1)'M(aq; fa’. 


This, too, is the immediate consequence of repeated integration by parts. 


4. Derivation of first method. Formally multiplying each side of (1) by 
x”/n! and integrating, we have the formal identity 


L,(F) = aoLn(f) —aLn-(f) +++ +} (—1)"a,Ly(f) : 


This is a “triangular” system of linear equations in the unknown a’s. It may 
be written in matrix terms 


L(F) rLo(f) 0 0 --: Ap 
L,(F) Li(f) L(f) O +++ || —m 
P L(f) Li(f) Lo(f) es 


| 
















182 PAUL A. SAMUELSON 






The triangular matrix has the very special property that all of its elements are 
known as soon as the first column is given. For this reason, as we shall see, it 
is essentially equivalent to a simple sequence of numbers. This we shall call 
the sequence property. Because of this special form, the above svstem by simple 
rearrangement may be written in the modified form 


L(F) O -::: Lif) O =: [ % 0 O-:- 
L(F) L(F) --- Li(f) Lol(f)--- ||—a. a 0. | 


a2 —@; i ** | 


It 
By appropriate definition of symbolism, this may be written in the simple matrix 
form: 


L(F) = L(f) a(F, f), 
since multiplication of two triangular, ‘sequence’ matrices is commutative. 
It is usually simplest to invert this triangular solution directly as in (2). 
But if necessary, we may express our answer in the equivalent form 


(5) a(F, f) = L(F) Lif)”, 


where the inverse to any special triangular matrix, also possesses the sequence 
property. 

If g is a second parent function with the properties of Section 2, we have the 
relationship 


a(F, g) = a(F, f) aff, 9) 


which follows directly from (5). This may be generalized to 


a(fi , fo) A(fo, fs) --- @(fnt, $n) = afi, fn) . 
If F itself is a parent function, we have 


a(F, f) af, F) = a(F, F) = 1 


a(f, F) = a(F,f)”. 





5. Relation to old methods. In terms of our notation, the Carver method 
seems to reduce to computing a(F, f) by the relationship 


a(F, f) = a(F, $) a(f, ¢) * 


where ¢ is the Type A parent function. It involves a doubling of the work of 
coefficient determination. However, if only a few terms in the expansion are 
retained, this is of negligible importance. 











re 


ll 
le 


ce 


he 


od 


of 








GRAM-CHARLIER SERIES 183 


The Charlier polynomials are clearly summed rows of the matrix product 


1/1! 0 0 
0 <2/l! O 
Lify*:| 0 0 2°/2! 


To know the first n of these polynomials, it is not necessary to derive 
n(n + 1)/2 different coefficients. Because of the sequence property, it is only 
necessary to derive n elements of the first column of L(f). These can be 
expressed in terms of the reduced moments of f, as did Charlier; but the rela- 
tionships are non-linear and algebraically become tedious for high n. They are 
better computed from sequence relationships. 

The above discussion suggests that the bi-orthogonal relationship between a 
parent family and suitable polynomials has no deep significance. In particular, 
there is no essential relationship to least squares as in orthogonal expansions. 
It does, however, share one important property with orthogonal functions— 
determination of later coefficients does not affect the earlier ones. But this is a 
property of all triangular reductions, orthogonal expansions being only special 


eases of these. 


6. Sequence properties. Ordinarily to derive the inverse of an n’ matrix, 
n’ equations must be solved. For our triangular matrices, we need only solve n 
equations for one column. To each triangular matrix L(f) there corresponds a 
sequence {L,(f)}, which is in fact the first column of the former. Similarly to 
L(f)', there corresponds {I,(f)}; the elements of the latter are defined by the n 
equations 


Lo(f) Lo(f) =1 
Lo(f)Li(f) + Li(f)Lo(f) = 0 


: Lilf)In-a(f) = 0 


But these are precisely the equations involved in the formal inversion of any 
linear operator system of the form 


(6) D ah'y =z 
; 


where h is an operator which commutes with a constant, and for which h° = 1. 
zis a known function and y unknown. Thus h may be such operators as 


x, d/dx, xd/dx, E, A. 


184 PAUL A. SAMUELSON 
A particular solution of (6) is given by the formal expansion 
y = DG hz 


where the 2’s bear the same relationship to the c’s as do the L’s to the L’s. 
Such “reciprocal” sequences appear in many branches of applied mathe- 
matics. In particular, they arise in the inversion of a power series. If formally, 


W(a) = ; Sra 


an 
Way = 3 Seo 


Thus, to any triangular matrix with the sequence property, we can formally 
associate a function W(a) as well as a sequence of numbers. The calculus of 
multiplication of our triangular matrices clearly ‘“‘corresponds” to the calculus 
of multiplication of functions, i.e. if the triangular matrices 7, , T2, --- T,, and 
Wi(a), We(a), --- Wala) correspond, and T, = 7T;-T2---Tn_; ; then 

W(a) = Wi(a)We(a) --- Waala). 


Also, 1/W;(a) corresponds to 77°. 


7. Moment generating functions. If only for the above reasons and no 
others, we should be tempted to consider the function formally defined by 


> Li(f)a’*. 


But this is precisely the expression for the familiar moment generating function, 
m. g. f. 


Mass) = [ etfle)de = D Lalor. 


In this way, the method of triangular matrices joins the method used by Aitken 
for the Type A family. If 


F(z) ~ » af’ (x), 
and we formally equate moment generating functions of each side, we get 
(7) M(a; F) = M(a;f) X) (-V'ava", 
0 


by means of the Aitken identity (4). Thus (—1)‘a; equals the coefficient of a’ 
in the formal expansion of 








of 
Is 


id 


no 


en 





GRAM-CHARLIER SERIES 185 


M(a;F) _ : a 
M(a;f) Pe M(a; F)M (a; f) . 


Our relationship (2) follows immediately from (7); and by Taylor’s expansion in 
a of M(a;f)’, the identity (3) is quickly realized. 

For many problems, the reciprocal of the m. g. f. of f(x) is itself a simple func- 
tion; to that our triangular equations may be inverted without solving linear 
equations. Thus where F(x) = f(x + b), we immediately verify Taylor’s expan- 
sion by use of familiar properties of the m. g. f. under shift of origin. 





8. Finite difference expansions. Corresponding to integration by parts, we 
have the formula 


> wv'v; = (-1) > aw. VV; = (-19 & Aw V;,  ete., 


provided “high contact” properties are assumed. V and A are receding and 
advancing differences respectively. Recalling the familiar property of ‘‘reduced 
factorial” polynomials, *x , we have 


so 0 


ave) = 1 Safle) jek 
= 0 j<k, 
or 
QuV'N) = (- 10a) jek 
=0 j<k, 
where 


Qi(g) = Y MERE NA TIED Qe), 
In the expansion 


F(x) ~ aof(x) + aiVf(x) + aV'f(x) +---, 


the a’s obey laws identical to (2) and (3) where reduced factorial moments are 
substituted for the reduced L moments, and the f. m. g. f. 


¥ sa) + ay*, 


for the ordinary m. g. f. 


9. Convergence. All of the above relationships are purely formal, without 
regard to convergence. The last is a difficult subject, and little discussed in the 
statistical literature, since applications of G-C series have been almost entirely 
concerned with empirical frequency curve fitting in which mathematieal con- 







































Tb PAUL A. SAMUELSON 


vergence does not enter. Actually in the scanty treatments of the subject there 
has arisen a confusion between the Type A G-C expansion, which equates 
moments, and the expansion of a function in orthogonal Hermite functions. 
These are not unrelated, but nevertheless they are distinct. This is well recog- 
nized in the purely mathematical literature, but hardlv at. all in the literature of 
statistics and physics. 

The series differ by an irremovable factor of 2. If the Type A functions are 
written as 


Hidx)e™’, 
then the Hermite functions will take the form 
Hidx)e*’. 


where the H’s are Hermite polynomials suitably normalized. Unfortunately 
the G-C series often diverges when the H series converges. Thus, the statistically 
interesting Cauchy distribution can be expanded in an H series; but since it 
possesses no finite higher moments, the G-C series cannot even be defined. 

It is not hard to show that the G-C expansion of F in terms of a Type A func- 
tion f(x), is equivalent to an H expansion of Ff in terms of the H family f'. 
It is sufficient for convergence in the mean of the last expansion that Ff~ * be of 
integrable square or belong to L’. This means that the G-C type A expansion 
will be valid if Ff is well behaved, not simply if F is well behaved. For F a 
histogram as is often the case in practise, no difficulties of convergence arise, 
although rapid convergence may be another matter. Nevertheless, many well 
behaved F’s will not pass the more strict test. The reader is referred to the last 
five titles in the bibliography for mathematical discussions of this problem. 

The above discussion holds only for the Type Aexpansion. There remains the 
very difficult problem of convergence conditions in the more general case. No 
immediate generalization suggests itself, except the application of the results of 
the ‘‘“moment problem.”’ However, this must be handled with delicacy, since 
the partial sums of the series may actually become negative over some range. 


BIBLIOGRAPHY 


SE rn eee = 


References 


{1] J. P. Gram, “Ueber die Entwicklung reeler Functionen in Reihen mittelst der Methode 
der Kleinsten Quadrate,’’ Journal fiir die Reine und Angewandte Mathemattk, 
Vol. 94 (1882), pp. 41-73. 

(2) H. Bruns, Wahrscheinlichkeitsrechnung und Kollektivmasslehre, B. G. Teubner, 
Leipzig, 1906. 

[3] C. V. L. Cuarurer, “Uber die Darstellung willkirlicher Funktionen,” Arkiv for 
Matematik, Astronomi och Fysik (utgivet af k. svenska vetenskapsakademien), 
Vol. 2 (1905), 35 pp. 

[4] H. C. Carver, Chapter VII, Handbook of Mathematical Statistics, edited by H. L. 
Rietz, Cambridge, Massachusetts, 1924. 

[5] Emererro Roa, ‘‘A number of new generating functions with application to statis- 

tics,’’ 1923. 








eS 


ely 
ly 
» it 


ne- 
» of 


ion 
Fa 


vell 
last 


the 
No 
s of 
nce 


hode 
atik, 


yner, 


» for 


ten), 
is Bs 


tatis- 








GRAM-CHARLIER SERIES 187 


[6] A. C. ArrKEN, Statistical Mathematics, London, 1939. 

[7] V. Romanovsky, ‘‘Generalisation of some Types of the Frequency Curves of Professor 
Pearson,’”’ Biometrika, Vol. 16 (1924), pp. 106-116. 

(8] Dunnam Jackson, Fourier Series and Orthogonal Polynomials, (Carus Mathemati- 
cal Monograph No. 6), The Mathematical Association of America, Oberlin, 
Ohio, 1941. 

{9} E. H. Hitpesranpt, “Systems of polynomials connected with the Charlier expan- 
sions and the Pearson differential and difference equations,’’ Annals of Math. 
Stat., Vol. II, 1931, pp. 379-439. 


Further Literature 


1. W. Mytier-Leseperr, “Die Theorie der Integralgleichungen in Anwendung auf 
einige Reihenentwicklungen,’’ Math. Annalen, Vol. 64 (1907), pp. 388-416. 

2. E. Hruuez, “A class of reciprocal functions,” Annals of Math., Vol. 27 (1926), pp. 
427-464. (Contains selected bibliography.) 

3. M. H. Stone, ‘‘Developments in Hermite polynomials,’”’ Annals of Math., Vol. 29 
(1927), pp. 1-13. 

4. W. E. Mine, ‘‘On the degree of convergence of the Gram-Charlier series,’’ Trans. 
Amer. Math. Soc., Vol. 31 (1929), pp. 422-444. 

5. Bibliography on Orthogonal Polynomials, National Research Council of the National 
Academy of Sciences, Washington, D. C., 1940. 











A METHOD OF TESTING THE HYPOTHESIS THAT TWO SAMPLES 
ARE FROM THE SAME POPULATION 


By Haroutp C. MATHISEN 


Princeton University 


1. Introduction. There are many cases in testing whether two samples are 
from the same population in which no assumption about the distribution func- 
tion of the population can be made except that it is continuous. A. Wald and 
J. Wolfowitz, [1], have developed a method of testing the hypothesis that two 
samples come from the same population based on certain kinds of runs of the 
elements from each sample in the combined ordered sample. W. J. Dixon, [2], 
has introduced a criterion for testing the same hypothesis based on the number 
of elements of the second sample falling between each successive pair of ordered 
values in the first sample. 

The problem considered here is that of devising a simple method of testing 
the hypothesis that two samples come from the same population, based on 
medians and quartiles, given only that the distribution function of the popula- 
tion is continuous. The simplest method may be described briefly as follows. 
We observe the number of elements, m; , in the second sample whose values are 
lower than the median of the first sample. Since the distribution of m, is inde- 
pendent of the population distribution, we are able to compute significance 
points from the distribution of m,. These points may then be used for testing 
the hypothesis at a given significance level. This will be referred to as the case 
of two intervals. 

This method may be easily extended to the case of any number of intervals. 
In this note we shall consider the extension to four intervals by using the median 
and the two quartiles of the first sample to establish four intervals into which 
the elements of the second sample may fall. Then, if the second sample is of 
size 4m, it will be shown that, under the hypothesis that the two samples come 
from the same population, + of the second sample, or m elements will be expected 
to fall in each interva!. Let the number in the second sample which actually 
fall in each interval be m,, m2, m3, and m, respectively. The test function 
here proposed is, 


(1) gw ee ee te te 
: 9m? ? 


where 9m’ is a constant, which forces C to lie on the interval 0 to1. If the m;, 
(i = 1, 2, 3, 4), have values quite different from their expected value m, it is 
apparent that C will be large. Therefore the greater the value of C the more 
doubtful is the hypothesis that the two samples come from the same population. 
Significance values of C will be computed for several sample sizes. The ques- 
tion of whether C is the “best” four-interval criterion for testing the hypothesis 
that two samples come from the same continuous distribution is an open one 
188 





wm 


are 
1Cc- 
nd 
wo 


2], 
er 


ing 


la- 
ws, 
are 
de- 
ace 
ing 
ase 


als. 
ian 
ich 
: of 
me 


ully 
jon 


Ni, 
t is 
ore 
ion. 
1es8- 
esis 
one 





TESTING TWO SAMPLES 189 


which would depend for its answer on an extensive power function analysis. 
We shall not go into this analysis, however, but shall use C on intuitive grounds. 
This case will be referred to as the case of four intervals. The extension of the 
method of the case of four intervals to any number of intervals presents no new 
difficulties in derivation, however we shall confine our attention to the cases of 
two and four intervals. 


2. The case of two intervals. Suppose f(z) is a continuous distribution func- 
tion with probability element f(z) dz. Let us draw a sample of size 2n + 1 from 
a population having this probability element. Let the elements in the sample 
be 21, 22, °** , Zen41 Ordered from least to greatest. The median of this sample 
will be 2,41. Now consider a second sample of size 2m, and let m, be the num- 
ber of observations, whose values are less than z,4,. We call m. = 2m — m 
the number of elements in the second sample greater than 2n4; . 


Zn+1 
Let p = [ f(x) dx be the probability of an observation having a value less 


than z,4:. Then the probability of an element having a value greater than 
In4iis(1 — p). Thus we have the relation f(z,4:) dtn41 = dp. The probability 
law of the median, z,4: given by the multinomial law’ is 


2n+ 1)! , . 
(2) P+(Xn41) = a p'(1 — p)" dp. 
The conditional probability law of m,, given 241, is then 
2m)! m m—m 
a tgif. 


m!(2m — m,)! 
From this it follows that the joint probability law of z,4;: and m, is the product 
of (2) and (3) or 


(3) P,(my | tn41) = 


(2n + 1)!(2m)! 


n+m n+2m—m 
n!n!m,!(2m — m)! y"a — ») '@. 


(4) P,(m , Zn41) = 


We may integrate (4) with respect to p from 0 to 1 as a-Beta Function, leaving 
the distribution function of m, independent of the population probability ele- 
ment f(x) dz. We get for the distribution of m, 


1 The multinomial law may be stated briefly as follows: 

If a trial results in one and only one of the mutually exclusive events EZ, , FE: ,--- , Ee, 
the probability P that in a total of n trials, n, will result in EZ, , nz in E:,--- ,min Ee, 

k 


YLn=n , 18 given by 
1 


n! 


n n n 
.——— oro 
My! Nei o> NEE 


k 
where p; , P2,°** , De, (x pi = :) , are the probabilities of a single trial resulting in 
1 


E, , Ey, +++ , Ey respectively. 


190 HAROLD C. MATHISEN 


(2n + 1)!(2m)!(n + m)!(n + 2m — m)! 
n!n!m!(2m — m)!(2n + 1 + 2m)! 


From (5) a simple recursion relation between P,(m,) and P,(m, + 1) may be 
determined from which the probabilities of various values of m may be rapidly 
computed. For large samples it can be shown that under certain regularity 
conditions, the ratio, [m; — E(m)]/om, may be approximated by the normal distri- 
bution’ with zero mean and unit variance. The derivation is similar to that of 
the four-interval case, which is taken up in greater detail. It will be found by 
the use of (4) that the expected value of m; is m, and the variance of m, is m + 


nn — m’. Using this information, values of m, for various 


(5) P,(m) — 





| 
j 
| 


TABLE I 


The Case of Two Intervals 
Lower and upper .01 and .05 percentage points for the distribution of my 











Sample sizes Critical values of m, 
First Second name | oye 
2 1 2 
_ ' M1001) ™1 05) ™1¢ 95) Mi (01) 
11 10 1 9 
41 40 10 12 28 30 
101 100 34 38 62 66 
101 200 72 80 120 128 
201 200 77 84 116 123 
201 400 160 181 219 240 
401 400 167 177 223 233 
401 800 353 | 367 433 447 
1001 1000 448 463 537 552 


significance levels may be computed. The .01 and .05 percentage points of m 
for several sample sizes are given in Table I. The values for sample sizes of 10 
and 40 are computed directly from the probability law, while the larger samples 
have limits computed by the normal approximation. Thus for two samples of 
size 101 and 100, respectively; a value of m, less than 38 would be significant 
at the .05 level. Similarly, at the upper .05 level, the hypothesis would be 
rejected if a value of m; were obtained which was greater than 62. The necessity 
for the upper limits could easily be eliminated by testing with respect to the 
smaller of m, and m.. However, for completeness, the upper percentage points 


pe 


2 This statement may be proved by showing that as m,n — © such that m/n = constant, 
the limit of the moment generating function for the ratio is identical with the moment 
generating function of the normal distribution with zero mean and unit variance. | 








y be 
idly 
arity 
istri- 
at of 
d by 
m + 


rious 


f mM 
f 10 
ples 
s of 
cant 
1 be 
sity 
the 
ints 


lant, 
nent 


TESTING TWO SAMPLES 191 


are included to show the range of values of m, in which the hypothesis that the 
two samples come from the same population may be accepted. 


3. The case of four intervals. If we let the first sample of size 4n + 3 be 
designated by (21 , 2, --* Tans), assumed drawn from a population with prob- 
ability element f(z) dx and ordered from least to greatest, then the range of x 
may be divided into four intervals by 2n41, Zeny2, aNd Xgn43. The probability 
element of 2141, Zen42, T3n+3 18 


W231 ("nae [ae (one) (se) 


S(Xn41) Anyi f(Leny2) ALengef(Lsn4s) ALan4s - 


TABLE II 


The Case of Four Intervals 
.95 and .99 percentage points for the distribution of C 





Sample sizes 











First ; Second Css Cn 
4n+3 4m n m 
15 12 3 3 .446 .582 
63 60 15 15 .113 .161 
103 100 25 25 .072 .102 
Let 


7n+1 Ten+2 Z3n+3 ° 

[CV s@az=n, [s@ar=m, | fadz=m, | fadr=m. 
© Zn+1 Zen+2 Z3n+8 

The probability element of pi , po , ps , and py, is 


(6) De(Lingt)) = 


Now let us consider the second sample, (1; , 22, °-* Zam), of size 4m. Let the 
number of observations falling in each of the preassigned intervals be m; , (2 = 
1, 2, 3, 4), where mz = 4m — m, — m — m;. The conditional probability of 
the m; , given the values of zi,n41) is also determined by the multinomial law. 


(4m)! 


(7) P,(m; | Ling) = mM! me! m3! m,! 


PI'p2"ps pa 
The joint distribution of the p; and the m;, is then 
(4n + 3)!(4m)! nim, 


n+m, 


ps py ™* dp; dps dps. 


(8) P, (Xing) » M4) = (n!)4my! me! ms! ma! ma Ps 



























192 HAROLD C. MATHISEN 


To obtain the distribution of the m; alone, the p; will be integrated out by the 
Dirichlet Integral’ formula, giving a distribution which is clearly independent 
of the population distribution function f(x). 


_ (4n + 3)! (4m)! (n + mi)! (n + me)! (n + ms)! (n + ms)! 
(9) Peon) =~ (alymstmalmglmal(m + in 3)E 


To find the expected value of the m;, the probability law of m, will first be 
derived. The probability function for the value of z,41: is 


_  (4n +3)! n an+2 
(10) P,(Zn41) = I!nl(3n + 2)! pi(l — pr) dp. . 
Then we have the conditional probability 
4 ! m m—m 
(11) P,(m | Ln+1) _ ( m) 1 Pi ‘(1 ps pi)" ‘ 


m!(4m — m) 
and 

(4n + 3)! (4m)! 
n\(3n + 2)!m! (4m — m1” 


To obtain the expected value of m,, the joint distribution of m, and 7p, is | 


(12) Pr(tng1, mi) = 


t(1 _— on dp, : 


multiplied by m; , summed on m, from 0 to 4m , and integrated on p; from 0 to 1. 


ass (4n + 3)! a 7 3n+2 
E(m) = waa L pi(l Pi) 


4m | 
(4m)! my, 4m—m | 

. —_ 1- ‘I1dp. 

[> os m! (4m — m)!"! ( Pr) - | 
This interchange of the order of integration and summation is clearly valid. 

The quantity in brackets will be recognized as the first moment of the binomial 
distribution, (p, + q)*” where gq = 1 — p,. Therefore we have 


(14) Bin) = [ snp. fio) dp, = 4enBl,). | 


E(p,) and the higher moments of p; are found in the usual way by integrating 
the distributions as Beta Functions. From this we see that the expected value 
of m, ism. By repeating these operations on m2, m3, and m,, it can be seen 
that E(m;) = m, which also validates the statement made in the introduction. 


167. It may be stated as follows for the problem in which we are interested 


oe _ FOrm)r@)r(r) 
[f+ y™ 12-1 — 2 — y — 2) dx dy dz tnésen’ 


where we integrate over the region bounded by z + y + z = 1, and the three coordinate 


) 
3 A discussion of the Dirichlet Integral may be found in Woods—Advanced Calculus, p. 
planes. 





Le 


ye 


. -— © 


TFSTING TWO SAMPLES 193 


We have previously presented the criterion (1). 
The next problem is to find a distribution function to which the distribution 
of C may be fitted. A reasonable choice appears to be the Pearson Type I curve. 


I(r + 8) 
T(r) ~ 


The distribution of C is fitted by equating the first two moments of the two dis- 
tributions and solving for the constants r and s of the Type I distribution. Using 
the theorem that the mean value of the sum of variates is equal to the sum of 
their mean values, we have 


(15) f(x) = "a = af". 





(16) E(C) = g [B(mi) + Elm’) + Eom’) + Elm’) — 4m’) 
Also the second moment may be written as 


E(C’) = ; o —— [E(mi) + E(m2) + E(m3) + E(m4) + 16m‘ + 2E(mim:) 


(17) + 2E(mi m3) + 2E(mimi) + 2E(m2m3) + 2E(m2 mi) 
+ 2E(m3mi) — 8m? {E(mi) + E(m:) + E(ms) + E(mi)}). 


The expected value of mi is found in the same manner as E(m) and here also it 
can be shown that the E(m%) are all equal. The same procedure holds for 
E(m?). 





2) _ m(4m — 1)(n + 2) 
E(m;) =m a _——s ’ 
Ha 4 m(4m —1)(n+ 2) | _6m(4m — 1)(4m — 2)(n + 3)(n + 2) 
— yes 4n +5 (4n + 6)(4n + 5) 


4. mdm — 1)(4m — 2)(4m — 3)(n + 4)(n + 3)(n + 2) 
(4n + 7)(4n + 6)(4n + 5) 


By using the moment generating function of the trinomial distribution, the 
E(m2m}) may also be found in a similar manner. 


m(4m — 1)(n + 1) 4 2m(4m — 1)(4m — 2)(n + 1)(n + 2) 
4n +5 (4n + 6)(4n + 5) 


m(4m — 1)(4m — 2)(4m — 3)(n + 2)(n + 1)(n + 2) 
(4n + 7)(4n + 6)(4n + 5) 


E(m;m;) = 





(19) 
ae 


As a result we have 


4(4m — 1)(n + 2) 


4 
(20) ae) 5 + "Gals + 





















194 HAROLD C. MATHISEN 


Let E(C) = A to simplify later relations to be computed. Finally 
E(C*) = 4 [1 4 7(4m — 1)(n +2), 6(4m — 1)(4m — 2)(n + 3)(n + 2) 














81m 4n+ 5 — (4n + 6)(4n + 5) 
(4m — 1)(4m — 2)(4m — 3)(n + 4)(n + 3)(n + 2) 4 
(4n + 7)(4n + 6)(4n + 5) — 
3(4m — 1)(n + 1) , 6(4m — 1)(4m — 2)(n + 1)(n + 2) 
om) + 4n+5 + (4n + 6)(4n + 5) 


4 3(4m =_ 1)(4m - — 2)(4m — 3)(n + 2)?(n + + 1) 
(4n + 7)(4n+6)(4n+ 5) 
_ 8m'(4m — 1)(n + e | 
4n+ 5 
To simplify later relations we let E(C’) = 
The first two moments of the Type I distribution are easily found to be 





‘ 
— 8m° 


Solving these two simultaneous equations for r and s, 
B~A r 
(23) r= ae s = A ate 
A 


A number of percentage points for the Type I distribution have been computed 
by Miss Catherine Thompson, [3]. Using these limits, the hypothesis may be 
accepted or rejected as to whether or not the two samples come from the same 
population. 

Table II shows the .95 and .99 percentage points of C for three sample sizes. 













4. Summary. The problem considered here is that of devising a simple 
method of testing the hypothesis that two samples are from identical populations 
having continuous distribution functions. It may be summarized briefly as 
follows. The first sample is used to establish any desired number of intervals 
into which the observations of the second sample may fall. A test criterion is 
proposed which is based on the deviations of the numbers of elements of the 
second sample which fall in the intervals from the expected values of the respec- 
tive numbers. Two cases are discussed, that of two intervals and that of four 
intervals, making use of the median and quartiles in the first sample to deter- 
mine the intervals. Tables of 1% and 5% points for several sample sizes of 
both cases are given. 











REFERENCES 


[1] A. Wap anv J. WoLtrow1Tz, Annals of Math. Stat., Vol. 11 (1940), p. 147. 
[2] W. J. Dixon, Annals of Math. Stat., Vol. 11 (1940), p. 199. 
[3] CaTHERINE THompson, Biometrika, Vol. 32 (1941), p. 151. 








nuted 
ry be 
same 


imple 
tions 
ly as 
rvals 
on is 
f the 
spec- 
four 
leter- 
es of 


EG eo 


NOTES 


This section is devoted to brief research and expository articles, notes on method- 
ology and other short items. 
(ee tne I coerce 


NOTE ON THE INDEPENDENCE OF CERTAIN QUADRATIC FORMS 


By ALLEN T. CraicG 
University of Iowa 


Various approaches to the problem of the independence of quadratic forms 
in normally and independently distributed variables have been made by R. A. 
Fisher, Cochran, Madow and others. It is the purpose of this note to point 
out a few simple propositions which, in so far as the writer is aware, have not 
had specific mention in the literature. 


1. Independence of certain quadratic forms. THEoREM 1: A necessary and 
sufficient condition that two real symmetric quadratic forms, in n normally and 
independently distributed variables, be independent in the probability sense is that 
the product of the matrices of the forms be zero. 

Let the chance variable zx be normally distributed with mean zero and unit 
variance. Let 21, 22, °°: , 2, be n independent values of z and let A and B 
be two real symmetric matrices, each of order n. Write Q, = 22aj;x,;7; and 
Q. = LDb;;v.x; where || a;;|| = A and || b,;;|| = B. It is well known that the 
generating function of the moments of the joint distribution of Q, and Q, can be 
written 


Ga, ”) = |I — »A — BI, 
so that 
(1) |\I-vAA —NB|=|I—-dA||I— BI, 


- for all real values of \ and X’, is necessary and sufficient for the independence of 


Q: and Q:. 

If Q: and Q, are independent, then (1), being true for all real values of \ and 
\’, is in particular true for \ = ’.. Thus 
(2) |\IT — (A + B)| = |I — AA|| I — ABI. 
Denote by 71, r2 and r < 7m, + 12 respectively the ranks of A, B and A + B. 
Then r = nr; + 7 since (2) expresses the identity of two polynomials in \ of 


degrees r and 7; + 72. 
Further, if we write 


|I —dA | = (1 — Api) --- (1 — Ap), 


|Z —AB| = (1 — Am) «++ A — Ag,), 
195 











196 ALLEN T. CRAIG 


and | J — (A + B) | = (1 — Asi) --- (1 — As,,4,,), then, because the factoriza- 
tion of polynomials is unique, each s; can be paired with one of the numbers 
Pi, ***, Pr» M>°** + 4rq- Thus, if Q; and Q>. are independent, the rank of 
A + Bis the sum of the ranks of A and B, and the non-zero roots of the char- 
acteristic equation of A + B are those of the characteristic equation of A 
together with those of the characteristic equation of B. There exists an appro- 
priately chosen orthogonal matrix L of order n such that L’(A + B)L, L’ being 
the conjugate of L, is a matrix with the reciprocals of the numbers pi, --- , p,;, , 
91, °** 4 Qr, On the principal diagonal and zeros elsewhere. Then L’AL and 
L’BL have no overlapping non-zero elements and L’ALL’BL = 0. But L’ = 
L™", the inverse of L. Hence, upon multiplying both members of the preceding 
equation on the right by L’ and on the left by L, we have AB = 0. Since 
A = A’ and B = B’, likewise BA = 0. 

Conversely, suppose AB = 0. Then the matrix (J — AA)(I — VB) = 
I — »\A — 2X’B. These matrices being equal, their determinants are equal and 
the condition (1) for the independence of Q; and Q: is satisfied. 

The theorem is readily extended to the case of the mutual independence of 
any finite number of such quadratic forms. 

The product of a non-singular matrix and a matrix of rank R is a matrix of 
rank R. Hence, every non-singular quadratic form of the kind here discussed 
is correlated with every non-identically vanishing quadratic form in the same 
variables. 


2. Conditions for independent Chi-Square distributions. The preceding 
theorem enables one to determine, by multiplication of matrices, whether real 
symmetric quadratic forms in normally and independently distributed variables 
are themselves independent in the probability sense. The following theorem 
affords a simple test as to whether the distributions are of the Chi-Square type. 

THEOREM 2: Necessary and sufficient conditions that each of two real symmetric 
quadratic forms, in n normally and independently distributed variables with mean 
zero and unit variance, be independently distributed as is Chi-Square, are that 
thé product of the matrices of the forms be zero and that each matrix equal its own 
square. 

If Q, and Q2 are independently distributed as is Chi-Square, then AB = 0 
and each of the non-zero roots of the characteristic equations of A and B is +1. 
For an appropriately chosen orthogonal matrix L, of order n, L’AL is a matrix 
with 7; elements on the principal diagonal +1, all other elements being zero. 
For such a matrix it is seen that (L’AL)(L’AL) = L’A*L = L'AL and A’ = A. 
A similar argument shows that B’ = B. 

Conversely, if AB = 0, then Q, and Q, are independent. Further, if A? = A 
and B’ = B, each of the non-zero roots of the characteristic equations of A and 
Bis +1. This follows from the fact that the roots of the characteristic equa- 
tion of the square of any matrix are themselves the squares of the roots of the 


1Za- 
bers 
< of 
har- 
fA 
pro- 
ing 
Pr; ’ 
and 


ling 
ince 


and 
> of 


x of 
ssed 
me 


ling 
real 
bles 


ype. 
tric 
ean 
that 
own 


+1. 


trix 


NORMAL DISTRIBUTION 197 


characteristic equation of that matrix. Since A and B are real and symmetric, 
the roots under consideration are real. Thus Q; and Q2 have independent 
Chi-Square distributions with r; and rz degrees of freedom respectively. 

This theorem can likewise be extended to any finite number of these quadratic 
forms. 

Of special interest is the case of, say k, quadratic forms for which the sum of 
the k matrices is the identity matrix. Thus Ai + A2+--- + A, =I. By 
Theorem 1, it is both necessary and sufficient for the mutual independence of the 
k forms that A,A, = 0, u # v. 


Now 
A; =1—A,\— +--+ — Ava — Ai — ) ~Aj— es — Ab 
and 
AA; = Aj — AA; — +++ — Avud; — AiAj — -++ — Aj — +++ — ArA;, 


so that A; = A}. In this particular case it is to be seen that the mutual inde- 
pendence of the forms implies that their several distributions are of the Chi- 
Square type. 


A CHARACTERIZATION OF THE NORMAL DISTRIBUTION 
By Irvinc KAPLANSKY 


Harvard University 


In 1925 R. A. Fisher gave a geometric derivation of the joint distribution of 
mean and variance in samples from a normal population (Metron, Vol. 5, pp. 
90-104). On examining the argument however, we find that an (apparently) 
more general result is actually established: if f(a) --- f(z.) is a function g(m, s) 
of the sample mean m and standard deviation s, then the probability density of 
m and s in samples of n from the population f(x) is g(m, s)s"*. This condition 
on f(x) is of course satisfied if f(x) is normal; in this note we shall conversely show 
that for n 2 3 it characterizes the normal distribution. In the proof it will be 
assumed that g(m, s) possesses partial derivatives of the first order, although a 
weaker assumption would probably suffice. 

Let us for the moment restrict the variables x; to values such that f(z;) > 0. 
After a change of notation we have 


$(21) + e++ + o(2,) = h(u, v), 


where = logf,u = a1 + **+ +2n,0 = }(ait--++2,). A differentiation 
yields 


¢’(a;) = hu + hey. 














198 IRVING KAPLANSKY 


Solving two of these equations for h, , we find 


a) 4, = Ha = sa exo, 


and, for n = 3, it follows that the right member of (1) is a constant, say 2A. 
Then 

¢'(x;) — 2Ax; 

$(z) 


We now have f(x) = e*” whenever f(x) > 0; but since f(z) is continuous, this 


implies f(x) = e*” everywhere. 


¢'(x;) — 2Ax; = a constant B. 
Ax’ + Br +. 








J), 


2A. 


this 


NEWS AND NOTICES 


Readers are invited to submit to the Secretary of the Institute news items of general interest 
Personal Items 


Dr. Holbrook Working has been appointed Chief Statistical Consultant on 
Industrial Processes and Products in the Office of Production Research and 
Development of the War Production Board. 

Professor Harold Hotelling of Columbia University was the official representa- 
tive of the Institute of Mathematical Statistics at the Copernican Quadri- 
centennial Celebration which was held in New York City on May 24. 

Dr. Edward B. Olds has taken a position with the Curtiss-Wright Corporation. 

Dr. Nilan Norris is a Sergeant with the Fourth Statistical Control Unit of the 
Fourth Air Force with headquarters at San Francisco, California. 

Dr. Edward Helly is with the Signal Corps Training Program at Illinois Insti- 
tute of Technology. 

Dr. C. W. Cotterman is in the United States Army at Camp Grant, Illinois. 

Mr. M. D. Bingham has been commissioned an Ensign in the United States 
Naval Reserve and is stationed at Fort Schuyler, New York. 

Lt. George W. Petrie, USNR, is teaching in the Midshipmen’s School at 
Notre Dame, Ind. 








New Members 


The following persons have been elected to memhership in the Institute : 

Arias B., Jorge Civ. Eng. (Guatemala) Eng., Rural Electrification Administration, 420 
Locust St., St. Louis, Mo. 

Bailey, A.L. B.S. (Michigan) Stat., American Mutual Alliance, 60 East 42 St., New York, 
# 

Becker, Harold W. Instr., Mare Island Trainee School. 126 Benson Ave., Vallejo, Calif. 

Bernstein, Shirley R. B.S. (Carnegie Inst. Tech.) Res. Asst., United Steelworkers of 
America, Pittsburgh, Pa. °§501 Beverly Pl. 

Bickerstaff, Asst. Prof. Thomas A. M.A. (Mississippi) Univ. of Miss., University, Miss. 

Birnbaum, Asst. Prof. Z. William. Ph.D. (Lwow) Univ. of Wash., Seattle, Wash. 

Brumbaugh, Prof. Martin A. Ph.D. (Pennsylvania) Univ. of Buffalo, Buffalo, N. Y. 

Burrows, Glenn L. B.A. (Michigan State Coll.) Instr., Michigan State Coll., East Lans- 
ing, Mich. 

Cohen, Jozef B. B.S. (Chicago) Sage Fellow in Psychology, Cornell Univ., Ithaca, N. Y. 

Cope, Asso. Prof. T. Freeman. Ph.D. (Chicago) Queens College, Flushing, N. Y. 

Cudmore, Sedley A. M.A. (Oxford) Stat., Dominion Bur. of Stat., Ottawa, Canada. 

Cureton, Edward E. Ph.D. (Columbia) Sr. Personnel Technician, War Dept., RFD 1, 
Tauzemont, Alexandria, Va. 

De Castro, Prof. Lauro S. V. Civ. Eng. (Escola Nacional de Enginharia) Catholic Univ., 
Rio de Janeiro, Brazil. 62 rua David Campista. 

Edwards,G.D. A.B. (Harvard) Dir. of Quality Assurance, Bell Telephone Laboratories, 
463 West St., New York, N.Y. 

Gifford, Kenneth R. Student, Mass. Inst. Tech., Cambridge, Mass. 97 Bay State Rd., 
Boston, Mass. 


199 











200 NEWS AND NOTICES 






Gottfried, Bert. A. A.M. (Columbia) Stat. Clerk, 4300 Kaywood Dr., Mt. Ranier, Md. 

Hamilton, Prof. Thomas R. Ph.D. (Columbia) Texas A. & M. Coll., College Station, Tex. 

Heide, J.D. M.S. (Iowa) Stat., U.S. Rubber Co., 1324 Altoona Ave., Eau Claire, Wisc. 

Hilfer, Irma. M.A. (Columbia) Actuary, N. Y. C. Board of Transportation, 165 W. 97 St., 
New York, N. Y. 

Howell, John M. B.A. (UCLA) Stat., Northrop Aircraft Inc., Hawthorne, Calif. 4140 
W. 63 St., Los Angeles, Calif. 

Hurwicz, Leonid. L.L.M. (Warsaw) Res. Asso., Cowles Comm., Univ. of Chicago, Chi- 
cago, Ill. 

Kendall, Maurice G. M.A. (Cambridge) Stat., Chamber of Shipping of the United King- 
dom, Richmond House, Aldenham Rd., Bushey, Eng. 

Klein, Lawrence R. B.A. (California) Teaching Fellow, Mass. Inst. Tech., Cambridge, 
Mass. 

Kuznets, “eorge M. Ph.D. (California) Instr., Giannini Foundation, Univ. of Calif., 
Berkeley, Calif. 

Landau, H.G. M.S. (Carnegie Inst. Tech.) Stat. Analyst, War Dept., Washington, D.C. 
2408 20 St., N.E. 

Langmuir, Charles R. Ed.M. (Harvard) Carnegie Foundation. 437 West 59 St., New 
Tork, N.Y. 

Levy, HenryC. L.L.B. (Fordham) Instr.,N.Y.C.C., New York,N.Y. 600 West 116 St. 

Li, Jerome C. R. B.S. (Nanking) Student, Iowa State Coll., Ames, Iowa. 2184 Lincoln 
Way. 

Lieberman, ‘Jacob E. B.S. (Brooklyn Coll.) Jr. Stat., Census Bureau, Washington, D. C. 
2422 14 St., N. E. 

Martin, Margaret P. M.A. (Minnesota) Instr., Columbia Univ., New York, N.Y. 1280 
Amsterdam Ave. 

Nash, Stanley W. B.A. (Coll.of PugetSound) San Joaquin Experimental Range, O’Neals, 
Calif. 

Norton, Horace W. Ph.D. (London) Sr. Meteorologist, U.S. Weather Bur., Washington, 
D.C. 38118 North First Rd., Arlington, Va. 

Olds, Edward B. Ph.D. (Pittsburgh) Stat., Curtiss-Wright Corp. 298 Niagra Falls 
Blod., Buffalo, N. Y. 

Preston, Bernard. C.P.A., 103 Park Ave., New York, N. Y. 

Rosenblatt, David. B.S. (Coll. Cityof N.Y.) Asst. Stat., 1422 Whittier St., N. W., Wash- 
ington, D.C. 

Sard, Asst. Prof. Arthur. Ph.D. (Harvard) Queens College, Flushing, N. Y. 146-19 
Beech Ave. 

Schapiro, Anne. B.A. (Bryn Mawr) Jr. Analyst, Institute of Applied Econometrics, 
350 W. 57 St., New York, N. Y. 

Simpson, William B. Grad. Student, Columbia Univ., New York, N. Y. 

Springer, Melvin D. M.S. (Illinois) Asst. Instr., Univ. of Illinois, Urbana, Il. 

Stein, Irving. B.S. (Mass Inst. Tech.) Asso. Stat., War Dept., Washington, D.C. 611 
Oglethorpe St. . 

Stergion, Andrew P. M.S. (Mass Inst. Tech.) Ist Lt., USA, The Proving Center, Aber- 
deen Proving Gd., Md. 

Sternhell, Arthur I. B.A. (New York) Staff Asst., Metropolitan Life Ins. Co., 1938 E. 
Tremont Ave., Parkchester, N. Y. 

Thompson, Louis T. E. Ph.D. (Clark) Dir. Res. and Dev., Lukas-Harold Corp., In- 
dianapolis, Ind. 340 East Maple Rd. 

Tyler, Asst. Prof. George W. M.A. (Duke) Virginia Polytechnic Inst., Blacksburg, Va. 

Working,. Holbrook S. Ph.D. (Wisconsin) Chief Stat. Consultant, War Production 

Board, Washington, D. C. Food Res. Inst., Stanford Univ., Calif. 











Id. 

Tex. 
Vise. 
i its. 


4140 
Chi- 
<ing- 
idge, 
alif., 
d.C. 
New 


6 St. 
wcoln 


». C. 
1230 
eals, 
gton, 


Falls 


‘ash- 
{6-19 


ics, 


611 
.ber- 
8 E. 
_ In- 


, Va. 
stion 


NEWS AND NOTICES 201 


The following persons have been elected to Junior membership in the Institute: 
Blumenthal, Lydia. Hunter College, New York, N.Y. 1001 Lincoln Pl., Brooklyn, N.Y. 
Gunlogson, Lee. Univ. of Minnesota, Minneapolis, Minn. 1906 Third Ave. 

Heacock, Richard R. Oregon State Coll., Corvallis, Ore. P.O. Box 207, Seaside, Ore. 

Locatelli, Humbert J. Columbia Univ., New York, N.Y. 44 Seaman Ave. 

Mathisen, Harold C., Jr. Princeton Univ., Princeton, N. J. 4 Middle Dod Hall. 

Murphy, Ray Bradford. Princeton Univ., Princeton, N. J. 28 Godfrey Rd., Upper Mont- 
clair, N. J. 

Peters, Edward J., Jr. Georgetown Univ., Washington, D.C. 126 St. James Pl., Atlantic 
City, N. J. 

Smith, Joan T. Univ. of Minnesota, St. Paul, Minn. 673 East Nebraska Ave. 








SPECIAL COURSES IN STATISTICAL QUALITY CONTROL 


The application of statistics to quality control is now being furthered in a 
program in which the War Production Board and the U. S. Office of Education 
are cooperating to assist statisticians in various industrial areas to provide 
suitable courses of instruction sponsored by their own institutions. 

The general plan of the program has been influenced by two conclusions 
drawn from the experience gained in ESMWT courses carried on by Stanford 
University during 1942-43.1 These conclusions were: (1) that a short full- 
time course in statistical quality control tends to be peculiarly effective; and 
(2) that it is vital to have the initial courses followed by meetings in which the 
course members gather to report on applications they have made and to receive 
encouragement and any needed assistance. 

The giving of short full-time courses presents a problem of assembling a suitable 
staff, since four instructors will ordinarily be needed. If this problem were solved 
by arranging for a single staff to tour all the principal industrial regions giving 
courses in quality control, the local leadership necessary for establishing wide- 
spread use of statistical methods of quality control in industry would not be 
developed. The program adopted seems to offer an effective solution of these 
problems. 

Under the program now in effect, the War Production Board, through its 
Office of Production Research and Development supplies an experienced person 
to assist with the arrangement of courses and to participate in the instruction. 
Two of the instructors in each course will ordinarily be provided by a local educa- 
tional institution, which will also promote the course and make necessary local 
arrangements through its institutional representative of the Engineering Science 
and Management War Training program. It is not considered necessary that 
the instructors provided by the institution have previous experience with statisti- 
cal quality control provided they are sufficiently competent in the theory of 
sampling, but it is desirable that at least one of them have practical experience 
with quality control. It may often happen that one of the instructors can be a 
quality control man from a local industrial establishment. The representative 
of the WPB will assist with arrangements for bringing in one (or, where needed, 
two) additional outside instructors. 

The sponsoring institution costs for the courses, which do not include the 
salary and expenses of the representative of the WPB, may be provided through 
the ESMWT program. The follow-up work with men who have taken the 
initial courses may be arranged also as part of the ESMWT program of the 


1A description of these courses offered by Stanford University appeared in the Annals 
of Mathematical Statictics, March 1943, p. 96. 
2 At present Professor Holbrook Working is serving in this'capacity. 


202 






































yn 
le 


18 
rd 
1- 





COURSES IN QUALITY CONTROL 203 


educational institution sponsoring the original course. The follow-up work 
should be handled by a local instructor who participated in the original course. 

The two basic courses and the one follow-up course that have already been 
given by Stanford University were conducted under essentially the plan out- 
lined above, except that they did not have the benefit of assistance from the 
WPB. Three courses have thus far (May 25) been arranged under the new 
plan: one sponsored by Rhode Island State College, to be held during May 27 
to June 2 at Newport, and two sponsored by Stanford University, to be held 
respectively in Los Angeles, June 13 to 20, and in San Francisco, June 22 to 29. 
Preliminary steps have been taken toward the arrangement of several additional 
courses. 





REPORT OF THE NEW YORK MEETING OF THE INSTITUTE 


A joint meeting between the Institute and the American Society of Mechanical 
Engineers was held on Saturday, May 29, 1943 at the Engineering Societies 
Building, 29 West 39th Street, New York City. Of the ninety-five individuals 
attending the meeting, the following fifty-seven members of the Institute were 
present: 


Theodore W. Anderson, K. J. Arnold, Robert E. Bechhofer, B. M. Bennett, C. I. Bliss, 
Mary E. Boozer, P. Boschan, A. H. Bowker, Burton H. Camp, A. C. Cohen Jr., H. F. Dodge, 
C. Eisenhart, Mary L. Elveback, W. C. Flaherty, H. Goode, John I. Griffin, Charles C. 
Grove, Frank E. Grubbs, E. J. Gumbel, Harold Hotelling, J. M. Juran, B. F. Kimball, 
Lila Knudsen, Howard Levene, E. Vernon Lewis, Simon Lopata, Frank W. Lynch, 
Henry Mann, E. C. Molina, N. Morrison, Philip J. McCarthy, Luis F. Nanni, 
Franklin 8S. Nelson, M. L. Norden, P. S. Olmstead, R. F. Passano, Edward Paulson, G. A. 
D. Preinreich, A. C. Rosander, Arthur Sard, Henry Scheffé, Bernice Scherl, Edward M. 
Schrock, L. W. Shaw, William B. Simpson, S. G. Small, Arthur Stein, Andrew P. Stergion, 
M. Stevens, David F. Votaw Jr., A. Wald, Helen M. Walker, W. A. Wallis, S. S. Wilks, J. 
Wolfowitz, L. C. Young. 


The general topic of the meeting was Industrial Applications of Statistics. At 
the morning session the following papers were presented, with Professor Harold 
Hotelling presiding: 


1. On the Theory of Runs with some Application to Quality Control. 
J. Wolfowitz. 

2. On the Presentation of Data as Evidence. 
Churchill Eisenhart. 


At the afternoon session, the following papers were presented with Mr. E. C. 
Molina, as Chairman: 


1. A Sampling Inspection Plan for Continuous Production. 
H. F. Dodge. 

2. Tolerances and Product Acceptability. 
L. C. Young. 


A meeting of the Board of Directors was held after the afternoon session. 


Epwin G. OLps 
Secretary 


Photo-Lithoprint Reproduction 
EDWARDS BROTHERS, INC 
Lithoprinters 
ANN ARBOR, MICHIGAN 
1947 








